All predicatesShow sourcedoc_wiki.pl -- PlDoc wiki parser

This file defines the PlDoc wiki parser, which parses both comments and wiki text files. The original version of this SWI-Prolog wiki format was largely modeled after Twiki (http://twiki.org/). The current version is extended to take many aspects from markdown, in particular the doxygen refinement thereof.

See also
- http://www.stack.nl/~dimitri/doxygen/manual/markdown.html
Source wiki_lines_to_dom(+Lines:lines, +Args:list(atom), -Term) is det
Translate a Wiki text into an HTML term suitable for html//1 from the html_write library.
Source wiki_codes_to_dom(+String, +Args, -DOM) is det
Translate a plain text into a DOM term.
Arguments:
String- Plain text. Either a string or a list of codes.
Source wiki_structure(+Lines:lines, +BaseIndent, -Blocks:list(block)) is det[private]
Get the structure in terms of block-level elements: paragraphs, lists and tables. This processing uses a mixture of layout and punctuation.
Source take_block(+Lines, +BaseIndent, ?Block, -RestLines) is semidet[private]
Take a block-structure from the input. Defined block elements are lists, table, hrule, section header and paragraph.
Source ruler(+Line) is semidet[private]
True if Line contains 3 ruler chars and otherwise spaces.
Source list_item(+Lines, ?Type, ?Indent, -LI0, -LIT, -RestLines) is det[private]
Create a list-item. Naturally this should produce a single item, but DL lists produce two items, so we create the list of items as a difference list.
To be done
- Pass base-indent
Source rest_list_item(+Lines, +Type, +Indent, -RestItem, -RestLines) is det[private]
Extract the remainder (after the first line) of a list item.
Source take_blocks_at_indent(+Lines, +Indent, -Pars, -RestLines) is det[private]
Process paragraphs and verbatim blocks (==..==) in bullet-lists.
Source rest_list(+Lines, +Type, +Indent, -Items, -ItemTail, -RestLines) is det[private]
Source list_item_prefix(?Type, +Line, -Rest) is det[private]
Source split_dt(+LineAfterDollar, -DT, -Rest)[private]
First see whether the entire line is the item. This allows creating items holding : by using $ <tokens> :\n
Source ul_to_dl(+UL, -DL) is semidet[private]
Translate an UL list into a DL list if all entries are of the form "* <term> nl, <description>" and at least one <description> is non-empty, or all items are of the form [[PredicateIndicator]].
Source term_item(+LI, -DLItem, ?Tail) is semidet[private]
If LI is of the form <Term> followed by a newline, return it as dt-dd tuple. The <dt> item contains a term
\term(Text, Term, Bindings).
Source row(-Cells)// is det[private]
Source md_table_structure_line(+Chars)[private]
True if Chars represents Markdown table structure. We currently ignore the structure information.
Source rest_par(+Lines, -Par, +BaseIndent, +MaxI0, -MaxI, -RestLines) is det[private]
Take the rest of a paragraph. Paragraphs are ended by a blank line or the start of a list-item. The latter is a bit dubious. Why not a general block-level object? The current definition allows for writing lists without a blank line between the items.
Source section_header(+Lines, -Section, -RestLines) is semidet[private]
Get a section line from the input.
Source twiki_section_line(+Tokens, -Section) is semidet[private]
Extract a section using the Twiki conventions. The section may be preceeded by [Word], in which case we generate an anchor name Word for the section.
Source md_section_line(+Tokens, -Section) is semidet[private]
Handle markdown section lines staring with #
Source strip_ws_tokens(+Tokens, -Stripped)[private]
Strip leading and trailing whitespace from a token list. Note the the whitespace is already normalised.
Source strip_leading_ws(+Tokens, -Stripped) is det[private]
Strip leading whitespace from a token list.
Source tags(+Lines:lines, -Tags) is semidet[private]
If the first line is a @tag, read the remainder of the lines to a list of \tag(Name, Value) terms.
Source collect_tags(+IndentedLines, -Tags) is semidet[private]
Create a list Order-tag(Tag,Tokens) for each @tag encountered. Order is the desired position as defined by tag_order/2.
To be done
- Tag content is often poorly aligned. We now find the alignment of subsequent lines and assume the first line is alligned with the remaining lines.
Source tag_name(+String, -Tag:atom, -Order:int) is semidet[private]
If String denotes a know tag-name,
Source renamed_tag(+DeprecatedTag:atom, -Tag:atom, -Warn) is semidet[private]
Declaration for deprecated tags.
Source tag_order(+Tag:atom, -Order:int) is semidet[private]
Both declares the know tags and their expected order. Currently the tags are forced into this order without warning. Future versions may issue a warning if the order is inconsistent.
Source combine_tags(+Tags:list(tag(Key,Value)), -Tags:list) is det[private]
Creates the final tag-list. Tags is a list of
  • \params(list(param(Name, Descr)))
  • \tag(Name, list(Descr))

Descr is a list of tokens.

Source wiki_faces(+Structure, +ArgNames, -HTML) is det[private]
Given the wiki structure, analyse the content of the paragraphs, list items and table cells and apply font faces and links.
Source structure_term(+Term, -Functor, -Content) is semidet[private]
structure_term(-Term, +Functor, +Content) is det[private]
(Un)pack a term describing structure, so we can process Content and re-pack the structure.
Source verbatim_term(?Term) is det[private]
True if Term must be passes verbatim.
Source matches(:Goal, -Input, -Last)//[private]
True when Goal runs successfully on the DCG input and Input is the list of matched tokens.
Source wiki_faces(-WithFaces, +ArgNames)// is nondet[private]
Source wiki_faces(-WithFaces, +ArgNames, +Options)// is nondet[private]
Apply font-changes and automatic links to running text. The faces are applied after discovering the structure (paragraphs, lists, tables, keywords).
Arguments:
Options- is a dict, minimally containing depth
 prolog:doc_wiki_face(-Out, +VarNames)// is semidet[multifile]
prolog:doc_wiki_face(-Out, +VarNames, +Options0)// is semidet[multifile]
Hook that can be used to provide additional processing for additional inline wiki constructs. The DCG list is a list of tokens. Defined tokens are:
w(Atom)
Recognised word (alphanumerical)
Atom
Single character atom representing punctuation marks or the atom ' ' (space), representing white-space.

The Out variable is input for the backends defined in doc_latex.pl and doc_html.pl. Roughly, these are terms similar to what html//1 from library(http/html_write) accepts.

 wiki_face_simple(-Out, +ArgNames, +Options)[private]
Skip simple (non-markup) wiki.
Source code_words(-Words)//[private]
True when Words is the content as it appears in `code`, where `` is mapped to `.
Source eq_code_words(-Words)//[private]
Stuff that can be between single =. This is limited to
  • Start and end must be a word
  • In between may be the following punctuation chars: .-:/, notably dealing with file names and identifiers in various external languages.
Source code_face(+Text, +Term, +Vars, -Code) is det[private]
Deal with `... code ...` sequences. Text is the matched text, Term is the parsed Prolog term and Code is the resulting intermediate code.
 emphasis_seq(-Out, +ArgNames, +Options) is semidet[private]
Recognise emphasis sequences
Source emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
Source emphasis_before(-Before)// is semidet[private]
Source emphasis_start(-Emphasis)// is semidet[private]
Source emphasis_end(+Emphasis)// is semidet[private]
Primitives for Doxygen emphasis handling.
 arg_list(-Atoms) is nondet[private]
Atoms is a token-list for a Prolog argument list. An argument-list is a sequence of tokens '(' ... ')'.
bug
- the current implementation does not deal correctly with brackets that are embedded in quoted strings.
Source term_face(+Text, +Term, +Vars, -Face) is semidet[private]
Process embedded Prolog-terms. Currently processes Alias(Arg) terms that refer to files. Future versions will also provide pretty-printing of Prolog terms.
Source image_label(-Label)//[private]
Match File[;param=value[,param=value]*]
 file_options(-Options) is det[private]
Extracts additional processing options for files. The format is ;name="value",name2=value2,... Spaces are not allowed.
Source wiki_link(-Link, +Options)// is semidet[private]
True if we can find a link to a file or URL. Links are described as one of:
filename
A filename defined using autolink_file/2 or autolink_extension/2
<url-protocol>://<rest-url>
A fully qualified URL
'<' URL '>'
Be more relaxed on the URL specification.
 prolog:url_expansion_hook(+Term, -HREF, -Label) is semidet[multifile]
This hook is called after recognising <Alias:Rest>, where Term is of the form Alias(Rest). If it succeeds, it must bind HREF to an atom or string representing the link target and Label to an html//1 expression for the label.
Source file_name(-Name:atom, -Ext:atom)// is semidet[private]
Matches a filename. A filename is defined as a sequence <segment>{/<segment}.<ext>.
Source resolve_file(+Name, -Options, ?RestOptions) is det[private]
Find the actual file based on the pldoc_file global variable. If present and the file is resolvable, add an option absolute_path(Path) that reflects the current location of the file.
Source arity(-Arity:int)// is semidet[private]
True if the next token can be interpreted as an arity. That is, refers to a non-negative integers of at most 20. Although Prolog allows for higher arities, we assume 20 is a fair maximum for user-created predicates that are documented.
Source symbol_string(-String)// is nondet[private]
Accept a sequence of Prolog symbol characters, starting with the shortest (empty) match.
Source prolog_symbol_char(?Char)[private]
True if char is classified by Prolog as a symbol char.
Source autolink_extension(?Ext, ?Type) is nondet
True if Ext is a filename extensions that create automatic links in the documentation.
Source autolink_file(?File, -Type) is nondet
Files to which we automatically create links, regardless of the extension.
Source section_comment_header(+Lines, -Header, -RestLines) is semidet
Processes /** <section> comments. Header is a term \section(Type, Title), where Title is an atom holding the section title and Type is an atom holding the text between <>.
Arguments:
Lines- List of Indent-Codes.
Header- DOM term of the format \section(Type, Title), where Type is an atom from <type> and Title is a string holding the type.
Source normalise_white_space(-Text)// is det
Text is input after deleting leading and trailing white space and mapping all internal white space to a single space.
Source tokenize_lines(+Lines:lines, -TokenLines) is det[private]
Convert Indent-Codes into Indent-Tokens
Source line_tokens(-Tokens:list)// is det[private]
Create a list of tokens, where is token is either a ' ' to denote spaces, a term w(Word) denoting a word or an atom denoting a punctuation character. Underscores (_) appearing inside an alphanumerical string are considered part of the word. E.g., "hello_world_" tokenizes into [w(hello_world), '_'].
Source verbatim(+Lines, +EnvIndent, -Pre, -RestLines) is det[private]
Extract a verbatim environment. The returned Pre is of the format pre(Attributes, String). The indentation of the leading fence is substracted from the indentation of the verbatim lines. Two types of fences are supported: the traditional == and the Doxygen ~~~ (minimum 3 ~ characters), optionally followed by {.ext} to indicate the language.

Verbatim environment is delimited as

  ...,
  verbatim(Lines, Pre, Rest)
  ...,

In addition, a verbatim environment may simply be indented. The restrictions are described in the documentation.

Source tilde_fence_ext(-Ext)// is semidet[private]
Detect `{.prolog} (Doxygen) or `{prolog} (GitHub)
Source indented_verbatim_body(+Lines, +Indent, -CodeLines, -RestLines)[private]
Takes more verbatim lines. The input ends with the first line that is indented less than Indent. There cannot be more than one consequtive empty line in the verbatim body.
Source valid_verbatim_opening(+Line) is semidet[private]
Tests that line does not look like a list item or table.
Source lines_code_text(+Lines, +Indent, -Codes) is det[private]
Extract the actual code content from a list of line structures.
Source pre_indent(+Indent)// is det[private]
Insert Indent leading spaces. Note we cannot use tabs as these are not expanded by the HTML <pre> element.
Source summary_from_lines(+Lines:lines, -Summary:list(codes)) is det
Produce a summary for Lines. Similar to JavaDoc, the summary is defined as the first sentence of the documentation. In addition, a sentence is also ended by an empty line or the end of the comment.
Source skip_empty_lines(+LinesIn, -LinesOut) is det[private]
Remove empty lines from the start of the input. Note that this is used both to process character and token data.
Source indented_lines(+Text:list(codes), +Prefixes:list(codes), -Lines:list) is det
Extract a list of lines without leading blanks or characters from Prefix from Text. Each line is a term Indent-Codes, where Indent specifies the line_position of the real text of the line.
Source end_of_comment//[private]
Succeeds if we hit the end of the comment.
bug
- %*/ will be seen as the end of the comment.
Source take_prefix(+Prefixes:list(codes), +Indent0:int, -Indent:int)// is det[private]
Get the leading characters from the input and compute the line-position at the end of the leading characters.
Source string_update_linepos(+Codes, +Pos0, -Pos) is det[private]
Update line-position after adding Codes at Pos0.
Source update_linepos(+Code, +Pos0, -Pos) is det[private]
Update line-position after adding Code.
To be done
- Currently assumes tab-width of 8.
Source take_line(-Line:codes)// is det[private]
Take a line from the input. Line does not include the terminating \r or \n character(s), nor trailing whitespace.
Source normalise_indentation(+LinesIn, -LinesOut) is det[private]
Re-normalise the indentation, such that the lef-most line is at zero. Note that we skip empty lines in the computation.
Source strip_leading_par(+Dom0, -Dom) is det
Remove the leading paragraph for environments where a paragraph is not required.
Source ws// is det[private]
Eagerly skip layout characters
 non_ws(-Text, ?Tail) is det[private]
True if the difference list Text-Tail is the sequence of non-white-space characters.
Source nl//[private]
Get end-of-line
Source peek(H)//[private]
True if next token is H without eating it.
Source tokens(-Tokens:list)// is nondet[private]
Source tokens(+Max, -Tokens:list)// is nondet[private]
Defensively take tokens from the input. Backtracking takes more tokens. Do not include structure terms.
Source tokens_no_whitespace(-Tokens:list(atom))// is nondet[private]
Defensively take tokens from the input. Backtracking takes more tokens. Tokens cannot include whitespace. Word tokens are returned as their represented words.
Source limit(+Count, :Rule)//[private]
As limit/2, but for grammar rules.
Source wiki_faces(-WithFaces, +ArgNames)// is nondet[private]
Source wiki_faces(-WithFaces, +ArgNames, +Options)// is nondet[private]
Apply font-changes and automatic links to running text. The faces are applied after discovering the structure (paragraphs, lists, tables, keywords).
Arguments:
Options- is a dict, minimally containing depth
 prolog:doc_wiki_face(-Out, +VarNames)// is semidet[private]
prolog:doc_wiki_face(-Out, +VarNames, +Options0)// is semidet[private]
Hook that can be used to provide additional processing for additional inline wiki constructs. The DCG list is a list of tokens. Defined tokens are:
w(Atom)
Recognised word (alphanumerical)
Atom
Single character atom representing punctuation marks or the atom ' ' (space), representing white-space.

The Out variable is input for the backends defined in doc_latex.pl and doc_html.pl. Roughly, these are terms similar to what html//1 from library(http/html_write) accepts.

Source emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
Source emphasis_before(-Before)// is semidet[private]
Source emphasis_start(-Emphasis)// is semidet[private]
Source emphasis_end(+Emphasis)// is semidet[private]
Primitives for Doxygen emphasis handling.
Source emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
Source emphasis_before(-Before)// is semidet[private]
Source emphasis_start(-Emphasis)// is semidet[private]
Source emphasis_end(+Emphasis)// is semidet[private]
Primitives for Doxygen emphasis handling.
Source emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
Source emphasis_before(-Before)// is semidet[private]
Source emphasis_start(-Emphasis)// is semidet[private]
Source emphasis_end(+Emphasis)// is semidet[private]
Primitives for Doxygen emphasis handling.
Source tokens(-Tokens:list)// is nondet[private]
Source tokens(+Max, -Tokens:list)// is nondet[private]
Defensively take tokens from the input. Backtracking takes more tokens. Do not include structure terms.