:- use_module(library(unicode)).(can be autoloaded)
- Unicode Versioning Stability has to be respected.
- Compatiblity decomposition (i.e. formatting information is lost)
- Return a result with composed characters.
- Return a result with decomposed characters.
- Strip "default ignorable characters"
- Return an error, if the input contains unassigned code points.
- Indicating that NLF-sequences (LF, CRLF, CR, NEL) are representing a line break, and should be converted to the unicode character for line separation (LS).
- Indicating that NLF-sequences are representing a paragraph break, and should be converted to the unicode character for paragraph separation (PS).
- Indicating that the meaning of NLF-sequences is unknown.
- Strips and/or convers control characters. NLF-sequences are transformed into space, except if one of the NLF2LS/PS/LF options is given. HorizontalTab (HT) and FormFeed (FF) are treated as a NLF-sequence in this case. All other control characters are simply removed.
- Performs unicode case folding, to be able to do a case-insensitive string comparison.
- Inserts 0xFF bytes at the beginning of each sequence which is representing a single grapheme cluster (see UAX#29).
- (e.g. HYPHEN U+2010 and MINUS U+2212 to ASCII "-"). (See module header for details.) If NLF2LF is set, this includes a transformation of paragraph and line separators to ASCII line-feed (LF).
- Strips all character markings (non-spacing, spacing and enclosing) (i.e.
accents) NOTE: this option works only with