doc_wiki.pl -- PlDoc wiki parser
This file defines the PlDoc wiki parser, which parses both comments and wiki text files. The original version of this SWI-Prolog wiki format was largely modeled after Twiki (http://twiki.org/). The current version is extended to take many aspects from markdown, in particular the doxygen refinement thereof.
- wiki_lines_to_dom(+Lines:lines, +Args:list(atom), -Term) is det
- Translate a Wiki text into an HTML term suitable for html//1 from the html_write library.
- wiki_codes_to_dom(+String, +Args, -DOM) is det
- Translate a plain text into a DOM term.
- wiki_structure(+Lines:lines, +BaseIndent, -Blocks:list(block)) is det[private]
- Get the structure in terms of block-level elements: paragraphs, lists and tables. This processing uses a mixture of layout and punctuation.
- take_block(+Lines, +BaseIndent, ?Block, -RestLines) is semidet[private]
- Take a block-structure from the input. Defined block elements are lists, table, hrule, section header and paragraph.
- ruler(+Line) is semidet[private]
- True if Line contains 3 ruler chars and otherwise spaces.
- list_item(+Lines, ?Type, ?Indent, -LI0, -LIT, -RestLines) is det[private]
- Create a list-item. Naturally this should produce a single item, but DL lists produce two items, so we create the list of items as a difference list.
- rest_list_item(+Lines, +Type, +Indent, -RestItem, -RestLines) is det[private]
- Extract the remainder (after the first line) of a list item.
- take_blocks_at_indent(+Lines, +Indent, -Pars, -RestLines) is det[private]
- Process paragraphs and verbatim blocks (==..==) in bullet-lists.
- rest_list(+Lines, +Type, +Indent, -Items, -ItemTail, -RestLines) is det[private]
- list_item_prefix(?Type, +Line, -Rest) is det[private]
- split_dt(+LineAfterDollar, -DT, -Rest)[private]
- First see whether the entire line is the item. This allows creating items holding : by using $ <tokens> :\n
- ul_to_dl(+UL, -DL) is semidet[private]
- Translate an UL list into a DL list if all entries are of the form "* <term> nl, <description>" and at least one <description> is non-empty, or all items are of the form [[PredicateIndicator]].
- term_item(+LI, -DLItem, ?Tail) is semidet[private]
- If LI is of the form <Term> followed by a newline, return it as
dt-dd tuple. The <dt> item contains a term
\term(Text, Term, Bindings).
- row(-Cells)// is det[private]
- md_table_structure_line(+Chars)[private]
- True if Chars represents Markdown table structure. We currently ignore the structure information.
- rest_par(+Lines, -Par, +BaseIndent, +MaxI0, -MaxI, -RestLines) is det[private]
- Take the rest of a paragraph. Paragraphs are ended by a blank line or the start of a list-item. The latter is a bit dubious. Why not a general block-level object? The current definition allows for writing lists without a blank line between the items.
- section_header(+Lines, -Section, -RestLines) is semidet[private]
- Get a section line from the input.
- twiki_section_line(+Tokens, -Section) is semidet[private]
- Extract a section using the Twiki conventions. The section may be preceeded by [Word], in which case we generate an anchor name Word for the section.
- md_section_line(+Tokens, -Section) is semidet[private]
- Handle markdown section lines staring with #
- strip_ws_tokens(+Tokens, -Stripped)[private]
- Strip leading and trailing whitespace from a token list. Note the the whitespace is already normalised.
- strip_leading_ws(+Tokens, -Stripped) is det[private]
- Strip leading whitespace from a token list.
- tags(+Lines:lines, -Tags) is semidet[private]
- If the first line is a @tag, read the remainder of the lines to
a list of \
tag(Name, Value)
terms. - collect_tags(+IndentedLines, -Tags) is semidet[private]
- Create a list Order-
tag(Tag,Tokens)
for each @tag encountered. Order is the desired position as defined by tag_order/2. - tag_name(+String, -Tag:atom, -Order:int) is semidet[private]
- If String denotes a know tag-name,
- renamed_tag(+DeprecatedTag:atom, -Tag:atom, -Warn) is semidet[private]
- Declaration for deprecated tags.
- tag_order(+Tag:atom, -Order:int) is semidet[private]
- Both declares the know tags and their expected order. Currently the tags are forced into this order without warning. Future versions may issue a warning if the order is inconsistent.
- combine_tags(+Tags:list(tag(Key,Value)), -Tags:list) is det[private]
- Creates the final tag-list. Tags is a list of
- \
params(list(param(Name, Descr)))
- \
tag(Name, list(Descr))
Descr is a list of tokens.
- \
- wiki_faces(+Structure, +ArgNames, -HTML) is det[private]
- Given the wiki structure, analyse the content of the paragraphs, list items and table cells and apply font faces and links.
- structure_term(+Term, -Functor, -Content) is semidet[private]
- structure_term(-Term, +Functor, +Content) is det[private]
- (Un)pack a term describing structure, so we can process Content and re-pack the structure.
- verbatim_term(?Term) is det[private]
- True if Term must be passes verbatim.
- matches(:Goal, -Input, -Last)//[private]
- True when Goal runs successfully on the DCG input and Input is the list of matched tokens.
- wiki_faces(-WithFaces, +ArgNames)// is nondet[private]
- wiki_faces(-WithFaces, +ArgNames, +Options)// is nondet[private]
- Apply font-changes and automatic links to running text. The faces are applied after discovering the structure (paragraphs, lists, tables, keywords).
- prolog:doc_wiki_face(-Out, +VarNames)// is semidet[multifile]
- prolog:doc_wiki_face(-Out, +VarNames, +Options0)// is semidet[multifile]
- Hook that can be used to provide additional processing for
additional inline wiki constructs. The DCG list is a list of
tokens. Defined tokens are:
- w(Atom)
- Recognised word (alphanumerical)
- Atom
- Single character atom representing punctuation marks or the
atom
' '
(space), representing white-space.
The Out variable is input for the backends defined in
doc_latex.pl
and doc_html.pl. Roughly, these are terms similar to what html//1 from library(http/html_write) accepts. - wiki_face_simple(-Out, +ArgNames, +Options)[private]
- Skip simple (non-markup) wiki.
- code_words(-Words)//[private]
- True when Words is the content as it appears in
`code`
, where``
is mapped to`
. - eq_code_words(-Words)//[private]
- Stuff that can be between single
=
. This is limited to- Start and end must be a word
- In between may be the following punctuation chars:
.-:/
, notably dealing with file names and identifiers in various external languages.
- code_face(+Text, +Term, +Vars, -Code) is det[private]
- Deal with
`... code ...`
sequences. Text is the matched text, Term is the parsed Prolog term and Code is the resulting intermediate code. - emphasis_seq(-Out, +ArgNames, +Options) is semidet[private]
- Recognise emphasis sequences
- emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
- emphasis_before(-Before)// is semidet[private]
- emphasis_start(-Emphasis)// is semidet[private]
- emphasis_end(+Emphasis)// is semidet[private]
- Primitives for Doxygen emphasis handling.
- arg_list(-Atoms) is nondet[private]
- Atoms is a token-list for a Prolog argument list. An argument-list is a sequence of tokens '(' ... ')'.
- term_face(+Text, +Term, +Vars, -Face) is semidet[private]
- Process embedded Prolog-terms. Currently processes Alias(Arg) terms that refer to files. Future versions will also provide pretty-printing of Prolog terms.
- image_label(-Label)//[private]
- Match File[;param=value[,param=value]*]
- file_options(-Options) is det[private]
- Extracts additional processing options for files. The format is ;name="value",name2=value2,... Spaces are not allowed.
- wiki_link(-Link, +Options)// is semidet[private]
- True if we can find a link to a file or URL. Links are described
as one of:
- filename
- A filename defined using autolink_file/2 or autolink_extension/2
- <url-protocol>://<rest-url>
- A fully qualified URL
- '<' URL '>'
- Be more relaxed on the URL specification.
- prolog:url_expansion_hook(+Term, -HREF, -Label) is semidet[multifile]
- This hook is called after recognising
<Alias:Rest>
, where Term is of the form Alias(Rest). If it succeeds, it must bind HREF to an atom or string representing the link target and Label to an html//1 expression for the label. - file_name(-Name:atom, -Ext:atom)// is semidet[private]
- Matches a filename. A filename is defined as a sequence <segment>{/<segment}.<ext>.
- resolve_file(+Name, -Options, ?RestOptions) is det[private]
- Find the actual file based on the pldoc_file global variable. If
present and the file is resolvable, add an option
absolute_path(Path)
that reflects the current location of the file. - arity(-Arity:int)// is semidet[private]
- True if the next token can be interpreted as an arity. That is, refers to a non-negative integers of at most 20. Although Prolog allows for higher arities, we assume 20 is a fair maximum for user-created predicates that are documented.
- symbol_string(-String)// is nondet[private]
- Accept a sequence of Prolog symbol characters, starting with the shortest (empty) match.
- prolog_symbol_char(?Char)[private]
- True if char is classified by Prolog as a symbol char.
- autolink_extension(?Ext, ?Type) is nondet
- True if Ext is a filename extensions that create automatic links in the documentation.
- autolink_file(?File, -Type) is nondet
- Files to which we automatically create links, regardless of the extension.
- section_comment_header(+Lines, -Header, -RestLines) is semidet
- Processes /** <section> comments. Header is a term
\
section(Type, Title)
, where Title is an atom holding the section title and Type is an atom holding the text between <>. - normalise_white_space(-Text)// is det
- Text is input after deleting leading and trailing white space and mapping all internal white space to a single space.
- tokenize_lines(+Lines:lines, -TokenLines) is det[private]
- Convert Indent-Codes into Indent-Tokens
- line_tokens(-Tokens:list)// is det[private]
- Create a list of tokens, where is token is either a ' ' to
denote spaces, a term
w(Word)
denoting a word or an atom denoting a punctuation character. Underscores (_) appearing inside an alphanumerical string are considered part of the word. E.g., "hello_world_" tokenizes into [w(hello_world)
, '_']. - verbatim(+Lines, +EnvIndent, -Pre, -RestLines) is det[private]
- Extract a verbatim environment. The returned Pre is of the
format
pre(Attributes, String)
. The indentation of the leading fence is substracted from the indentation of the verbatim lines. Two types of fences are supported: the traditional==
and the Doxygen~~~
(minimum 3~
characters), optionally followed by{.ext}
to indicate the language.Verbatim environment is delimited as
..., verbatim(Lines, Pre, Rest) ...,
In addition, a verbatim environment may simply be indented. The restrictions are described in the documentation.
- tilde_fence_ext(-Ext)// is semidet[private]
- Detect
`{.prolog} (Doxygen) or
`{prolog} (GitHub)
- indented_verbatim_body(+Lines, +Indent, -CodeLines, -RestLines)[private]
- Takes more verbatim lines. The input ends with the first line that is indented less than Indent. There cannot be more than one consequtive empty line in the verbatim body.
- valid_verbatim_opening(+Line) is semidet[private]
- Tests that line does not look like a list item or table.
- lines_code_text(+Lines, +Indent, -Codes) is det[private]
- Extract the actual code content from a list of line structures.
- pre_indent(+Indent)// is det[private]
- Insert Indent leading spaces. Note we cannot use tabs as these are not expanded by the HTML <pre> element.
- summary_from_lines(+Lines:lines, -Summary:list(codes)) is det
- Produce a summary for Lines. Similar to JavaDoc, the summary is defined as the first sentence of the documentation. In addition, a sentence is also ended by an empty line or the end of the comment.
- skip_empty_lines(+LinesIn, -LinesOut) is det[private]
- Remove empty lines from the start of the input. Note that this is used both to process character and token data.
- indented_lines(+Text:list(codes), +Prefixes:list(codes), -Lines:list) is det
- Extract a list of lines without leading blanks or characters from Prefix from Text. Each line is a term Indent-Codes, where Indent specifies the line_position of the real text of the line.
- end_of_comment//[private]
- Succeeds if we hit the end of the comment.
- take_prefix(+Prefixes:list(codes), +Indent0:int, -Indent:int)// is det[private]
- Get the leading characters from the input and compute the line-position at the end of the leading characters.
- string_update_linepos(+Codes, +Pos0, -Pos) is det[private]
- Update line-position after adding Codes at Pos0.
- update_linepos(+Code, +Pos0, -Pos) is det[private]
- Update line-position after adding Code.
- take_line(-Line:codes)// is det[private]
- Take a line from the input. Line does not include the
terminating \r or \n
character(s)
, nor trailing whitespace. - normalise_indentation(+LinesIn, -LinesOut) is det[private]
- Re-normalise the indentation, such that the lef-most line is at zero. Note that we skip empty lines in the computation.
- strip_leading_par(+Dom0, -Dom) is det
- Remove the leading paragraph for environments where a paragraph is not required.
- ws// is det[private]
- Eagerly skip layout characters
- non_ws(-Text, ?Tail) is det[private]
- True if the difference list Text-Tail is the sequence of non-white-space characters.
- nl//[private]
- Get end-of-line
- peek(H)//[private]
- True if next token is H without eating it.
- tokens(-Tokens:list)// is nondet[private]
- tokens(+Max, -Tokens:list)// is nondet[private]
- Defensively take tokens from the input. Backtracking takes more tokens. Do not include structure terms.
- tokens_no_whitespace(-Tokens:list(atom))// is nondet[private]
- Defensively take tokens from the input. Backtracking takes more tokens. Tokens cannot include whitespace. Word tokens are returned as their represented words.
- limit(+Count, :Rule)//[private]
- As limit/2, but for grammar rules.
- wiki_faces(-WithFaces, +ArgNames)// is nondet[private]
- wiki_faces(-WithFaces, +ArgNames, +Options)// is nondet[private]
- Apply font-changes and automatic links to running text. The faces are applied after discovering the structure (paragraphs, lists, tables, keywords).
- prolog:doc_wiki_face(-Out, +VarNames)// is semidet[private]
- prolog:doc_wiki_face(-Out, +VarNames, +Options0)// is semidet[private]
- Hook that can be used to provide additional processing for
additional inline wiki constructs. The DCG list is a list of
tokens. Defined tokens are:
- w(Atom)
- Recognised word (alphanumerical)
- Atom
- Single character atom representing punctuation marks or the
atom
' '
(space), representing white-space.
The Out variable is input for the backends defined in
doc_latex.pl
and doc_html.pl. Roughly, these are terms similar to what html//1 from library(http/html_write) accepts. - emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
- emphasis_before(-Before)// is semidet[private]
- emphasis_start(-Emphasis)// is semidet[private]
- emphasis_end(+Emphasis)// is semidet[private]
- Primitives for Doxygen emphasis handling.
- emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
- emphasis_before(-Before)// is semidet[private]
- emphasis_start(-Emphasis)// is semidet[private]
- emphasis_end(+Emphasis)// is semidet[private]
- Primitives for Doxygen emphasis handling.
- emphasis_term(+Emphasis, +Tokens, -Term) is det[private]
- emphasis_before(-Before)// is semidet[private]
- emphasis_start(-Emphasis)// is semidet[private]
- emphasis_end(+Emphasis)// is semidet[private]
- Primitives for Doxygen emphasis handling.
- tokens(-Tokens:list)// is nondet[private]
- tokens(+Max, -Tokens:list)// is nondet[private]
- Defensively take tokens from the input. Backtracking takes more tokens. Do not include structure terms.