This module finds literals of the RDF database based on words, stemming and sounds like (metaphone). The normal user-level predicate is
- Set options for the literal package. Currently defined options
true, print progress messages while building the index tables.
- Number of threads to use for initial indexing of literals
- How to deal with indexing new literals. How is one of
self(execute in the same thread),
thread(N)(execute in N concurrent threads) or
default(depends on number of cores).
- Add a token to the dynamic stopgap set if it appears in more than Count literals. The default is 50,000.
- rdf_find_literal(+Spec, -Literal) is nondet
- rdf_find_literals(+Spec, -Literals) is det
- Find literals in the RDF database matching Spec. Spec is defined
Spec ::= and(Spec,Spec) Spec ::= or(Spec,Spec) Spec ::= not(Spec) Spec ::= sounds(Like) Spec ::= stem(Like) % same as stem(Like, en) Spec ::= stem(Like, Lang) Spec ::= prefix(Prefix) Spec ::= between(Low, High) % Numerical between Spec ::= ge(High) % Numerical greater-equal Spec ::= le(Low) % Numerical less-equal Spec ::= Token
stem(Like)both map to a disjunction. First we compile the spec to normal form: a disjunction of conjunctions on elementary tokens. Then we execute all the conjunctions and generate the union using ordered-set algorithms.
Stopgaps are ignored. If the final result is only a stopgap, the predicate fails.
- rdf_token_expansions(+Spec, -Extensions)
- Determine which extensions of a token contribute to finding literals.
- Fully delete a literal index
- rdf_tokenize_literal(+Literal, -Tokens) is semidet
- Tokenize a literal. We make this hookable as tokenization is generally domain dependent.
- rdf_stopgap_token(-Token) is nondet
- True when Token is a stopgap token. Currently, this implies one
exclude_from_index(token, Token)is true
- Token is an atom of length 1
- Token was added to the dynamic stopgap token set because it appeared in more than stopgap_threshold literals.
- rdf_literal_index(+Type, -Index) is det
- True when Index is a literal map containing the index of Type.
Type is one of:
- Tokens are basically words of literal values. See
tokenmap maps tokens to full literal texts.
- Index of stemmed tokens. If the language is available, the
tokens are stemmed using the matching snowball stemmer.
stemmap maps stemmed to full tokens.
- Phonetic index of tokens. The
metaphonemap maps phonetic keys to tokens.