All predicatesShow sourcerdf_persistency.pl -- RDF persistency plugin

This module provides persistency for rdf_db.pl based on the rdf_monitor/2 predicate to track changes to the repository. Where previous versions used autosafe of the whole database using the quick-load format of rdf_db, this version is based on a quick-load file per source (4th argument of rdf/4), and journalling for edit operations.

The result is safe, avoids frequent small changes to large files which makes synchronisation and backup expensive and avoids long disruption of the server doing the autosafe. Only loading large files disrupts service for some time.

The persistent backup of the database is realised in a directory, using a lock file to avoid corruption due to concurrent access. Each source is represented by two files, the latest snapshot and a journal. The state is restored by loading the snapshot and replaying the journal. The predicate rdf_flush_journals/1 can be used to create fresh snapshots and delete the journals.

See also
- rdf_edit.pl
To be done
- If there is a complete `.new' snapshot and no journal, we should move the .new to the plain snapshot name as a means of recovery.
- Backup of each graph using one or two files is very costly if there are many graphs. Although the currently used subdirectories avoid hitting OS limits early, this is still not ideal. Probably we should collect (small, older?) files and combine them into a single quick load file. We could call this (similar to GIT) a `pack'.
Source rdf_attach_db(+Directory, +Options) is det
Start persistent operations using Directory as place to store files. There are several cases:
  • Empty DB, existing directory Load the DB from the existing directory
  • Full DB, empty directory Create snapshots for all sources in directory

Options:

access(+AccessMode)
One of auto (default), read_write or read_only. Read-only access implies that the RDF store is not locked. It is read at startup and all modifications to the data are temporary. The default auto mode is read_write if the directory is writeable and the lock can be acquired. Otherwise it reverts to read_only.
concurrency(+Jobs)
Number of threads to use for loading the initial database. If not provided it is the number of CPUs as optained from the flag cpu_count.
max_open_journals(+Count)
Maximum number of journals kept open. If not provided, the default is 10. See limit_fd_pool/0.
directory_levels(+Count)
Number of levels of intermediate directories for storing the graph files. Default is 2.
silent(+BoolOrBrief)
If true (default false), do not print informational messages. Finally, if brief it will show minimal feedback.
log_nested_transactions(+Boolean)
If true, nested log transactions are added to the journal information. By default (false), no log-term is added for nested transactions.\\
Errors
- existence_error(source_sink, Directory)
- permission_error(write, directory, Directory)
Source rdf_attach_db_ro(+Directory, +Options)[private]
Open an RDF database in read-only mode.
Source rdf_persistency_property(?Property) is nondet
True if Property is a property of the current persistent database. Currently makes to options passed to rdf_attach_db/2 available. Notable rdf_persistency_property(access(read_only)) is true if the database is mounted in read-only mode. Other properties:
directory(Dir)
Directory in which the database resides.
Source no_agc(:Goal)[private]
Run Goal with atom garbage collection disabled. Loading an RDF database creates large amounts of atoms we know are not garbage.
Source rdf_detach_db is det
Detach from the current database. Succeeds silently if no database is attached. Normally called at the end of the program through at_halt/1.
Source rdf_current_db(?Dir)
True if Dir is the current RDF persistent database.
Source rdf_flush_journals(+Options)
Flush dirty journals. Options:
min_size(+KB)
Only flush if journal is over KB in size.
graph(+Graph)
Only flush the journal of Graph
To be done
- Provide a default for min_size?
Source load_db is det[private]
Reload database from the directory specified by rdf_directory/1. First we find all names graphs using find_dbs/1 and then we load them.
Source make_goals(+DBs, +Silent, +Index, +Total, -Goals)[private]
Source concurrency(-Jobs)[private]
Number of jobs to run concurrently.
Source find_dbs(+Dir, -Graphs, -SnapBySize, -JournalBySize) is det[private]
Scan the persistent database and return a list of snapshots and journals, both sorted by file-size. Each term is of the form
db(Size, Ext, DB, DBFile, Depth)
Source scan_db_files(+Files, +Dir, +Prefix, +Depth)// is det[private]
Produces a list of db(DB, Size, File) for all recognised RDF database files. File is relative to the database directory Dir.
Source attach_graph(+Graph, +Options) is det[private]
Load triples and reload journal from the indicated snapshot file.
Source load_journal(+File:atom, +DB:atom) is det[private]
Process transactions from the RDF journal File, adding the given named graph.
Source rdf_persistency(+DB, Bool)
Specify whether a database is persistent. Switching to false kills the persistent state. Switching to true creates it.
Source rdf_db:property_of_graph(?Property, +Graph) is nondet[multifile]
Extend rdf_graph_property/2 with new properties.
Source start_monitor is det[private]
Source stop_monitor is det[private]
Start/stop monitoring the RDF database for changes and update the journal.
Source monitor(+Term) is semidet[private]
Handle an rdf_monitor/2 callback to deal with persistency. Note that the monitor calls that come from rdf_db.pl that deal with database changes are serialized. They do come from different threads though.
Source check_nested(+Level) is semidet[private]
True if we must log this transaction. This is always the case for toplevel transactions. Nested transactions are only logged if log_nested_transactions(true) is defined.
Source open_transaction(+DB, +Fd) is det[private]
Add a begin(Id, Level, Time, Message) term if a transaction involves DB. Id is an incremental integer, where each database has its own counter. Level is the nesting level, Time a floating point timestamp and Message te message provided as argument to the log message.
Source next_transaction_id(+DB, -Id) is det[private]
Id is the number to user for the next logged transaction on DB. Transactions in each named graph are numbered in sequence. Searching the Id of the last transaction is performed by the 2nd clause starting 1Kb from the end and doubling this offset each failure.
 end_transactions(+DBs:list(atom:id)) is det[private]
End a transaction that affected the given list of databases. We write the list of other affected databases as an argument to the end-term to facilitate fast finding of the related transactions.

In each database, the transaction is ended with a term end(Id, Nesting, Others), where Id and Nesting are the transaction identifier and nesting (see open_transaction/2) and Others is a list of DB:Id, indicating other databases affected by the transaction.

Source sync_loaded_graphs(+Graphs)[private]
Called after a binary triple has been loaded that added triples to the given graphs.
Source journal_fd(+DB, -Stream) is det[private]
Get an open stream to a journal. If the journal is not open, old journals are closed to satisfy the max_open_journals option. Then the journal is opened in append mode. Journal files are always encoded as UTF-8 for portability as well as to ensure full coverage of Unicode.
Source limit_fd_pool is det[private]
Limit the number of open journals to max_open_journals (10). Note that calls from rdf_monitor/2 are issued in different threads, but as they are part of write operations they are fully synchronised.
Source sync_journal(+DB, +Fd)[private]
Sync journal represented by database and stream. If the DB is involved in a transaction there is no point flushing until the end of the transaction.
Source close_journal(+DB) is det[private]
Close the journal associated with DB if it is open.
Source close_journals[private]
Close all open journals.
Source create_db(+Graph)[private]
Create a saved version of Graph in corresponding file, close and delete journals.
Source delete_db(+DB)[private]
Remove snapshot and journal file for DB.
Source lock_db(+Dir)[private]
Lock the database directory Dir.
Source unlock_db(+Dir) is det[private]
Source unlock_db(+Stream, +File) is det[private]
Source dir_levels(+File, +Levels, ?Segments, ?Tail) is det[private]
Create a list of intermediate directory names for File. Each directory consists of two hexadecimal digits.
Source db_files(+DB, -Snapshot, -Journal)[private]
db_files(-DB, +Snapshot, -Journal)[private]
db_files(-DB, -Snapshot, +Journal)[private]
True if named graph DB is represented by the files Snapshot and Journal. The filenames are local to the directory representing the store.
Source rdf_journal_file(+Graph, -File) is semidet
rdf_journal_file(-Graph, -File) is nondet
True if File the name of the existing journal file for Graph.
Source rdf_snapshot_file(+Graph, -File) is semidet
rdf_snapshot_file(-Graph, -File) is nondet
True if File the name of the existing snapshot file for Graph.
Source rdf_db_to_file(+DB, -File) is det
rdf_db_to_file(-DB, +File) is det
Translate between database encoding (often an file or URL) and the name we store in the directory. We keep a cache for two reasons. Speed, but much more important is that the mapping of raw --> encoded provided by www_form_encode/2 is not guaranteed to be unique by the W3C standards.
Source url_to_filename(+URL, -FileName) is det[private]
url_to_filename(-URL, +FileName) is det[private]
Turn a valid URL into a filename. Earlier versions used www_form_encode/2, but this can produce characters that are not valid in filenames. We will use the same encoding as www_form_encode/2, but using our own rules for allowed characters. The only requirement is that we avoid any filename special character in use. The current encoding use US-ASCII alnum characters, _ and %
Source reindex_db(+Dir, +Levels)[private]
Reindex the database by creating intermediate directories.
Source load_prefixes(+RDFDBDir) is det[private]
If the file RDFDBDir/prefixes.db exists, load the prefixes. The prefixes are registered using rdf_register_ns/3. Possible errors because the prefix definitions have changed are printed as warnings, retaining the old definition. Note that changing prefixes generally requires reloading all RDF from the source.
Source mkdir(+Directory)[private]
Create a directory if it does not already exist.
Source time_stamp(-Integer)[private]
Return time-stamp rounded to integer.
Source start_monitor is det[private]
Source stop_monitor is det[private]
Start/stop monitoring the RDF database for changes and update the journal.
Source unlock_db(+Dir) is det[private]
Source unlock_db(+Stream, +File) is det[private]