3.1.3 International source files
As discussed in section 2.19, SWI-Prolog supports international character handling. Its internal encoding is UNICODE. I/O streams convert to/from this internal format. This section discusses the options for source files not in US-ASCII.
SWI-Prolog can read files in any of the encodings described in
section 2.19. Two
encodings are of particular interest. The
text
encoding deals with the current locale, the
default used by this computer for representing text files. The encodings
utf8
, unicode_le
and unicode_be
are
UNICODE encodings: they can represent---in the same
file---characters of virtually any known language. In addition, they do
so unambiguously.
If one wants to represent non US-ASCII text as Prolog terms in a source file, there are several options:
- Use escape sequences
This approach describes NON-ASCII as sequences of the form\
octal\
. The numerical argument is interpreted as a UNICODE character.35To my knowledge, the ISO escape sequence is limited to 3 octal digits, which means most characters cannot be represented. The resulting Prolog file is strict 7-bit US-ASCII, but if there are many NON-ASCII characters it becomes very unreadable. - Use local conventions
Alternatively the file may be specified using local conventions, such as the EUC encoding for Japanese text. The disadvantage is portability. If the file is moved to another machine, this machine must use the same locale or the file is unreadable. There is no elegant way if files from multiple locales must be united in one application using this technique. In other words, it is fine for local projects in countries with uniform locale conventions. - Using UTF-8 files
The best way to specify source files with many NON-ASCII characters is definitely the use of UTF-8 encoding. Prolog can be notified of this encoding in two ways, using a UTF-8 BOM (see section 2.19.1.1) or using the directive:- encoding(utf8).
Many of today's text editors, including PceEmacs, are capable of editing UTF-8 files. Projects that were started using local conventions can be re-coded using the Unix iconv tool or often using commands offered by the editor.