
uri.pl -- Process URIs
This library provides high-performance C-based primitives for manipulating URIs. We decided for a C-based implementation for the much better performance on raw character manipulation. Notably, URI handling primitives are used in time-critical parts of RDF processing. This implementation is based on RFC-3986:
http://labs.apache.org/webarch/uri/rfc/rfc3986.html
The URI processing in this library is rather liberal. That is, we break URIs according to the rules, but we do not validate that the components are valid. Also, percent-decoding for IRIs is liberal. It first tries UTF-8; then ISO-Latin-1 and finally accepts %-characters verbatim.
Earlier experience has shown that strict enforcement of the URI syntax results in many errors that are accepted by many other web-document processing tools.
uri_components(+URI, -Components) is det
- uri_components(-URI, +Components) is det
- Break a URI into its 5 basic components according to the
RFC-3986 regular expression:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))? 12 3 4 5 6 7 8 9
uri_data(?Field, +Components, ?Data) is semidet
- Provide access the uri_component structure. Defined field-names
are:
scheme
,authority
,path
,search
andfragment
Undocumented predicates
The following predicates are exported, but not or incorrectly documented.
iri_normalized(Arg1, Arg2)
uri_normalized(Arg1, Arg2)
uri_normalized_iri(Arg1, Arg2)
uri_data(Arg1, Arg2, Arg3, Arg4)
uri_iri(Arg1, Arg2)
uri_file_name(Arg1, Arg2)
uri_authority_data(Arg1, Arg2, Arg3)
uri_encoded(Arg1, Arg2, Arg3)
uri_authority_components(Arg1, Arg2)
uri_is_global(Arg1)
uri_resolve(Arg1, Arg2, Arg3)
uri_query_components(Arg1, Arg2)
iri_normalized(Arg1, Arg2, Arg3)
uri_normalized_iri(Arg1, Arg2, Arg3)
uri_normalized(Arg1, Arg2, Arg3)