5.2.2 Representing text: strings, atoms and code lists
With the introduction of strings as a Prolog data type, there are three main ways to represent text: using strings, atoms or code lists. This section explains what to choose for what purpose. Both strings and atoms are atomic objects: you can only look inside them using dedicated predicates. Lists of character codes are compound datastructures.
- Lists of character codes
- is what you need if you want to parse text using Prolog grammar rules (DCGs, see phrase/3). Most of the text reading predicates (e.g., read_line_to_codes/2) return a list of character codes because most applications need to parse these lines before the data can be processed.
- Atoms
- are identifiers. They are typically used in cases where
identity comparison is the main operation and that are typically not
composed nor taken apart. Examples are RDF resources (URIs that identify
something), system identifiers (e.g.,
'Boeing 747'
), but also individual words in a natural language processing system. They are also used where other languages would use enumerated types, such as the names of days in the week. Unlike enumerated types, Prolog atoms do not form not a fixed set and the same atom can represent different things in different contexts. - Strings
- typically represents text that is processed as a unit most of the time, but which is not an identifier for something. Format specifications for format/3 is a good example. Another example is a descriptive text provided in an application. Strings may be composed and decomposed using e.g., string_concat/3 and sub_string/5 or converted for parsing using string_codes/2 or created from codes generated by a generative grammar rule, also using string_codes/2.