
rdf_litindex.pl -- Search literals
This module finds literals of the RDF database based on words, stemming and sounds like (metaphone). The normal user-level predicate is
rdf_set_literal_index_option(+Options:list)- Set options for the literal package. Currently defined options
- verbose(Bool)
- If
true, print progress messages while building the index tables. - index_threads(+Count)
- Number of threads to use for initial indexing of literals
- index(+How)
- How to deal with indexing new literals. How is one of
self(execute in the same thread),thread(N)(execute in N concurrent threads) ordefault(depends on number of cores). - stopgap_threshold(+Count)
- Add a token to the dynamic stopgap set if it appears in more than Count literals. The default is 50,000.
rdf_find_literal(+Spec, -Literal) is nondet
rdf_find_literals(+Spec, -Literals) is det- Find literals in the RDF database matching Spec. Spec is defined
as:
Spec ::= and(Spec,Spec) Spec ::= or(Spec,Spec) Spec ::= not(Spec) Spec ::= sounds(Like) Spec ::= stem(Like) % same as stem(Like, en) Spec ::= stem(Like, Lang) Spec ::= prefix(Prefix) Spec ::= between(Low, High) % Numerical between Spec ::= ge(High) % Numerical greater-equal Spec ::= le(Low) % Numerical less-equal Spec ::= Token
sounds(Like)andstem(Like)both map to a disjunction. First we compile the spec to normal form: a disjunction of conjunctions on elementary tokens. Then we execute all the conjunctions and generate the union using ordered-set algorithms.Stopgaps are ignored. If the final result is only a stopgap, the predicate fails.
rdf_token_expansions(+Spec, -Extensions)- Determine which extensions of a token contribute to finding literals.
rdf_delete_literal_index(+Type)- Fully delete a literal index
rdf_tokenize_literal(+Literal, -Tokens) is semidet- Tokenize a literal. We make this hookable as tokenization is generally domain dependent.
rdf_stopgap_token(-Token) is nondet- True when Token is a stopgap token. Currently, this implies one
of:
exclude_from_index(token, Token)is truedefault_stopgap(Token)is true- Token is an atom of length 1
- Token was added to the dynamic stopgap token set because it appeared in more than stopgap_threshold literals.
rdf_literal_index(+Type, -Index) is det- True when Index is a literal map containing the index of Type.
Type is one of:
- token
- Tokens are basically words of literal values. See
rdf_tokenize_literal/2. The
tokenmap maps tokens to full literal texts. - stem
- Index of stemmed tokens. If the language is available, the
tokens are stemmed using the matching snowball stemmer.
The
stemmap maps stemmed to full tokens. - metaphone
- Phonetic index of tokens. The
metaphonemap maps phonetic keys to tokens.
rdf_find_literal(+Spec, -Literal) is nondet
rdf_find_literals(+Spec, -Literals) is det- Find literals in the RDF database matching Spec. Spec is defined
as:
Spec ::= and(Spec,Spec) Spec ::= or(Spec,Spec) Spec ::= not(Spec) Spec ::= sounds(Like) Spec ::= stem(Like) % same as stem(Like, en) Spec ::= stem(Like, Lang) Spec ::= prefix(Prefix) Spec ::= between(Low, High) % Numerical between Spec ::= ge(High) % Numerical greater-equal Spec ::= le(Low) % Numerical less-equal Spec ::= Token
sounds(Like)andstem(Like)both map to a disjunction. First we compile the spec to normal form: a disjunction of conjunctions on elementary tokens. Then we execute all the conjunctions and generate the union using ordered-set algorithms.Stopgaps are ignored. If the final result is only a stopgap, the predicate fails.