3 Flexible ordering and equivalence based on character table
This package was developed as part of the GRASP project, where it is used for browsing lexical and ontology information, which is normally stored using `dictionary' order, rather than the more conventional alphabetical ordering based on character codes. To achieve programmable ordering, the table package defines `order tables'. An order table is a table with the cardinality of the size of the character set (256 for extended ASCII), and maps each character onto its `order number', and some characters onto special codes.
The default (exact
) table matches all character codes
onto themselves. The default case_insensitive
table matches
all uppercase characters onto their corresponding lowercase character.
The tables iso_latin_1
and iso_latin_1_case_insensitive
map the ISO-latin-1 letters with diacritics into their plain
counterpart.
To support dictionary ordering, the following special categories are defined:
ignore | Characters of the ignore set are simple discarded from the input. |
break | Characters from the break set are treated as word-breaks, and each non-empty sequence of them is considered equal. A word break precedes a normal character. |
tag | Characters of type tag indicate the start of a `tag' that should not be considered in ordering, unless both strings are the same upto the tag. |
The following predicates are defined to manage and use these tables:
- new_order_table(+Name, +Options)
- Create a new, or replace the order-table with the given name (an atom). Options
is a list of options:
case_insensitive
Map all upper- to lowercase characters. iso_latin_1
Start with an ISO-Latin-1 table iso_latin_1_case_insensitive
Start with a case-insensitive ISO-Latin-1 table copy(+Table)
Copy all entries from Table. tag(+ListOfCodes)
Add these characters to the set of `tag' characters. ignore(+ListOfCodes)
Add these characters to the set of `ignore' characters. break(+ListOfCodes)
Add these characters to the set of `break' characters. +Code1 = +Code2
Map Code1 onto Code2. - order_table_mapping(+Table, ?From, ?To)
- Read the current mapping. To is a character code or one of
the atoms
break
,ignore
ortag
. - compare_strings(+Table, +S1, +S2, -Result)
- Compare two strings using the named Table. S1 and
S2 may be atoms, strings or code-lists. Result is
one of the atoms
<
,=
or>
. - prefix_string(+Table, +Prefix, +String)
- Succeeds if Prefix is a prefix of String using the named Table.
- prefix_string(+Table, +Prefix, -Rest, +String)
- Succeeds if Prefix is a prefix of String using the named Table, and Rest is unified with the remainder of String that is not matched. Please note that the existence of an order-table implies simple contatenation using atom_concat/3 cannot be used to determine the non-matched part of the string.
- sub_string(+Table, +Sub, +String)
- Succeeds if Sub is a substring of String using the named Table.