View source with raw comments or as raw
    1/*  Part of ClioPatria SeRQL and SPARQL server
    2
    3    Author:        Michiel Hildebrand
    4    Author:        Jan Wielemaker
    5    E-mail:        michielh@few.vu.nl
    6    WWW:           http://www.swi-prolog.org
    7    Copyright (C): 2016, VU University Amsterdam
    8
    9    This program is free software; you can redistribute it and/or
   10    modify it under the terms of the GNU General Public License
   11    as published by the Free Software Foundation; either version 2
   12    of the License, or (at your option) any later version.
   13
   14    This program is distributed in the hope that it will be useful,
   15    but WITHOUT ANY WARRANTY; without even the implied warranty of
   16    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   17    GNU General Public License for more details.
   18
   19    You should have received a copy of the GNU General Public
   20    License along with this library; if not, write to the Free Software
   21    Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
   22
   23    As a special exception, if you link this library with other files,
   24    compiled with a Free Software compiler, to produce an executable, this
   25    library does not by itself cause the resulting executable to be covered
   26    by the GNU General Public License. This exception does not however
   27    invalidate any other reasons why the executable file might be covered by
   28    the GNU General Public License.
   29*/
   30
   31:- module(api_lod,
   32	  [ lod_api/2			% +Request
   33	  ]).   34
   35:- use_module(library(http/http_dispatch)).   36:- use_module(library(http/http_json)).   37:- use_module(library(http/http_host)).   38:- use_module(library(http/http_request_value)).   39:- use_module(library(http/http_cors)).   40:- use_module(library(semweb/rdf_db)).   41:- use_module(library(semweb/rdf_json)).   42:- use_module(library(semweb/rdf_describe)).   43:- use_module(library(settings)).   44:- use_module(library(option)).   45:- use_module(library(rdf_write)).   46:- use_module(library(semweb/rdf_turtle_write)).   47:- use_module(library(uri)).   48:- use_module(library(debug)).   49:- use_module(library(apply)).   50:- use_module(library(dcg/basics)).   51:- use_module(library(base64)).   52:- use_module(library(utf8)).

LOD - Linked Open Data server

Linked (Open) Data turns RDF URIs (indentifiers) into URLs (locators). Requesting the data behind the URL returns a description of the resource. So, if we see a resource http://example.com/employe/bill, we get do an HTTP GET request and expect to receive a description of bill. This module adds LOD facilities to ClioPatria.

Running the LOD server

There are several ways to run the LOD server.

  1. The simplest way to realise LOD is to run ClioPatria there where the authority component of the URL points to (see uri_components/2 for decomposing URIs). This implies you must be able to create a DNS binding for the host and be able to run ClioPatria there.
  2. Sometimes the above does not work, because the port is already assigned to another machine, you are not allowed to run ClioPatria on the target host, the target is behind a firewall, etc. In that case, notable if the host runs Apache, you can exploit the Apache module mod_proxy and proxy the connections to a location where ClioPatria runs. If you ensure that the path on Apache is the same as the path on ClioPatria, the following Apache configuration rule solves the problem:
    ProxyPass /rdf/ http://cliopatria-host:3020/rdf/
  3. Both above methods require no further configuration. Unfortunately, they require a registered domain control over DNS and administrative rights over certain machines. A solution that doesn't require this is to use www.purl.org. This allows you to redirect URLs within the purl domain to any location you control. The redirection method can be defined with purl. In the semantic web community, we typically use See other (303). The catch is that if the address arrives at ClioPatria, we no longer know where it came from. This is not a problem in (1), as there was no redirect. It is also not a problem in (2), because Apache adds a header x-forwarded-host. Unfortunately, there is no way to tell you are activated through a redirect, let alone where the redirect came from.

    To deal with this situation, we use the redirected_from option of lod_api/2. For example, if http://www.purl.org/vocabularies/myvoc/ is redirected to /myvoc/ on ClioPatria, we use:

    :- http_handler('/myvoc/',
                    lod_api([ redirected_from('http://www.purl.org/vocabularies/myvoc/')
                            ]),
                    [ prefix ]).

By default, there is no HTTP handler pointing to lod_api/2. The example above describes how to deal with redirected URIs. The cases (1) and (2) must also be implemented by registering a handler. This can be as blunt as registering a handler for the root of the server, but typically one would use one or more handlers that deal with sub-trees that act as Linked Data repositories. Handler declarations should use absolute addresses to guarantee a match with the RDF URIs, even if the server is relocated by means of the http:prefix setting. For example:

:- http_handler('/rdf/', lod_api([]), [prefix]).
See also
- http://linkeddata.org/ */
  126:- setting(lod:redirect, boolean, false,
  127	   'If true, redirect from accept-header to extension').
 lod_api(+Options, +Request)
Reply to a Linked Data request. The handler is capable of three output formats. It decides on the desired format based on the HTTP Accept header-field. If no acceptable format is found, it replies with a human-readable description of the resource using ClioPatria RDF browser-page as defined by list_resource//2.

Options:

redirected_from(+URL)
This option must be provided when using a purl.org or similar redirect. See overall documentation of this library.
bounded_description(+Type)
Description style to use. See rdf_bounded_description/4. The default is cbd (Concise Bounded Description)
  148lod_api(_Options, Request) :-
  149	\+ memberchk(path_info(_), Request), !,
  150	accepts(Request, AcceptList),
  151	preferred_format(AcceptList, Format),
  152	(   Format == html
  153	->  http_link_to_id(home, [], Redirect)
  154	;   http_link_to_id(well_known_void, [], Redirect)
  155	),
  156	http_redirect(see_other, Redirect, Request).
  157lod_api(_Options, Request) :-
  158	memberchk(path_info('/.well-known/void'), Request), !,
  159	http_link_to_id(well_known_void, [], Redirect),
  160	http_redirect(see_other, Redirect, Request).
  161lod_api(Options, Request) :-
  162	lod_uri(Request, URI, Options),
  163	debug(lod, 'LOD URI: ~q', [URI]),
  164	accepts(Request, AcceptList),
  165	triple_filter(Request, Filter),
  166	cors_enable,
  167	lod_request(URI, AcceptList, Request, Filter, Options).
  168
  169accepts(Request, AcceptList) :-
  170	(   memberchk(accept(AcceptHeader), Request)
  171	->  (   atom(AcceptHeader)	% compatibility
  172	    ->	http_parse_header_value(accept, AcceptHeader, AcceptList)
  173	    ;	AcceptList = AcceptHeader
  174	    )
  175	;   AcceptList = []
  176	).
 triple_filter(+Request, -Filter) is det
Extract Triple-Filter from Request. Ignores the filter if it is invalid.
  183triple_filter(Request, Filter) :-
  184	catch(phrase(triple_filter(Request), Filter), E,
  185	      (print_message(warning, E),fail)), !.
  186triple_filter(_, []).
 triple_filter(+Text)//
Translate an RDF triple pattern into a list of rdf(S,P,O) terms.
  193triple_filter([]) -->
  194	[].
  195triple_filter([triple_filter(Filter)|T]) --> !,
  196	one_triple_filter(Filter),
  197	triple_filter(T).
  198triple_filter([_|T]) -->
  199	triple_filter(T).
  200
  201one_triple_filter(Encoded) -->
  202	{ string_codes(Encoded, EncCodes),
  203	  phrase(base64(UTF8Bytes), EncCodes),
  204	  phrase(utf8_codes(PlainCodes), UTF8Bytes),
  205	  string_codes(Filter, PlainCodes),
  206	  split_string(Filter, "\r\n", "\r\n", Filters),
  207	  maplist(map_triple_filter, Filters, Triples)
  208	},
  209	string(Triples).
  210
  211map_triple_filter(String, rdf(S,P,O)) :-
  212	split_string(String, "\s\t", "\s\t", [SS,SP,SO]),
  213	triple_term(SS, S),
  214	triple_term(SP, P),
  215	triple_term(SO, O).
  216
  217triple_term("?", _) :- !.
  218triple_term(S, N) :-
  219	string_codes(S, Codes),
  220	phrase(sparql_grammar:graph_term(N), Codes).
 lod_request(+URI, +AcceptList, +Request, +Filter, +Options)
Handle an LOD request.
  226lod_request(URI, AcceptList, Request, Filter, Options) :-
  227	lod_resource(URI), !,
  228	preferred_format(AcceptList, Format),
  229	debug(lod, 'LOD Format: ~q', [Format]),
  230	(   cliopatria:redirect_uri(Format, URI, SeeOther)
  231	->  http_redirect(see_other, SeeOther, Request)
  232	;   setting(lod:redirect, true),
  233	    redirect(URI, AcceptList, SeeOther)
  234	->  http_redirect(see_other, SeeOther, Request)
  235	;   lod_describe(Format, URI, Request, Filter, Options)
  236	).
  237lod_request(URL, _AcceptList, Request, Filter, Options) :-
  238	format_request(URL, URI, Format), !,
  239	lod_describe(Format, URI, Request, Filter, Options).
  240lod_request(URI, _AcceptList, _Request, _Filter, _) :-
  241	throw(http_reply(not_found(URI))).
 lod_uri(+Request, -URI, +Options)
URI is the originally requested URI. This predicate deals with redirections if the HTTP handler was registered using the option redirected_from(URL). Otherwise it resolves the correct global URI using http_current_host/4.
  251lod_uri(Request, URI, Options) :-
  252	memberchk(redirected_from(Org), Options),
  253	memberchk(request_uri(ReqURI), Request),
  254	handler_location(Request, Location),
  255	atom_concat(Location, Rest, ReqURI),
  256	atom_concat(Org, Rest, URI).
  257lod_uri(Request, URI, _) :-
  258	memberchk(request_uri(ReqURI), Request),
  259	http_current_host(Request, Host, Port,
  260			  [ global(true)
  261			  ]),
  262	(   Port == 80
  263	->  atomic_list_concat(['http://', Host, ReqURI], URI)
  264	;   atomic_list_concat(['http://', Host, :, Port, ReqURI], URI)
  265	).
 handler_location(+Request, -Location) is det
Location is the requested location on the server. This includes the handler location, normally concatenated with the path_info.
  273handler_location(Request, Location) :-
  274	memberchk(path(Path), Request),
  275	(   memberchk(path_info(Rest), Request),
  276	    atom_concat(Location, Rest, Path)
  277	->  true
  278	;   Location = Path
  279	).
 redirect(+URI, +AcceptList, -RedirectURL)
Succeeds if URI is in the store and a RedirectURL is found for it.
  287redirect(URI, AcceptList, To) :-
  288	lod_resource(URI),
  289	preferred_format(AcceptList, Format),
  290	(   cliopatria:redirect_uri(Format, URI, To)
  291	->  true
  292	;   uri_components(URI, URIComponents),
  293	    uri_data(path, URIComponents, Path0),
  294	    format_suffix(Format, Suffix),
  295	    file_name_extension(Path0, Suffix, Path),
  296	    uri_data(path, URIComponents, Path, ToComponents),
  297	    uri_components(To, ToComponents)
  298	).
 preferred_format(+AcceptList, -Format) is det
Format is the highest ranked mimetype found in the Acceptlist of the request and that we can support. Expects an AcceptList sorted by rank.
  307preferred_format(AcceptList, Format) :-
  308	member(media(MimeType,_,_,_), AcceptList),
  309	ground(MimeType),
  310	mimetype_format(MimeType, Format), !.
  311preferred_format(_, html).
 format_request(+URL, -URI, -Format) is semidet
True if URL contains a suffix that corresponds to a supported output format, and the global URI occurs in the database.
  319format_request(URL, URI, Format) :-
  320	uri_components(URL, URLComponents),
  321	uri_data(path, URLComponents, Path),
  322	file_name_extension(Base, Ext, Path),
  323	(   format_suffix(Format, Ext),
  324	    mimetype_format(_, Format)
  325	->  true
  326	),
  327	uri_data(path, URLComponents, Base, PlainComponents),
  328	uri_components(URI, PlainComponents),
  329	lod_resource(URI).
 lod_describe(+Format, +URI, +Request, +Filter, +Options) is det
Write an HTTP document describing URI to in Format to the current output. Format is defined by mimetype_format/2.
  337lod_describe(html, URI, Request, _, _) :- !,
  338	(   rdf_graph(URI)
  339	->  http_link_to_id(list_graph, [graph=URI], Redirect)
  340	;   http_link_to_id(list_resource, [r=URI], Redirect)
  341	),
  342	http_redirect(see_other, Redirect, Request).
  343lod_describe(Format, URI, _Request, Filter, Options) :-
  344	lod_description(URI, RDF, Filter, Options),
  345	send_graph(Format, RDF).
  346
  347send_graph(xmlrdf, RDF) :-
  348	format('Content-type: application/rdf+xml; charset=UTF-8~n~n'),
  349	rdf_write_xml(current_output, RDF).
  350send_graph(json, RDF) :-
  351	graph_json(RDF, JSON),
  352	reply_json(JSON).
  353send_graph(turtle, RDF) :-
  354	format('Content-type: text/turtle; charset=UTF-8~n~n'),
  355	rdf_save_turtle(stream(current_output),
  356			[ expand(triple_in(RDF)),
  357			  only_known_prefixes(true),
  358			  silent(true)
  359			]).
 triple_in(+RDF, ?S, ?P, ?O, ?G) is nondet
Lookup a triple in the graph RDF, represented as a list of rdf(S,P,O).
To be done
- Describe required indexing from rdf_save_turtle/2 and implement that if the graph is big.
  369:- public triple_in/5.			% called from send_graph/2.
  370
  371triple_in(RDF, S,P,O,_G) :-
  372	member(rdf(S,P,O), RDF).
 lod_description(+URI, -RDF, +Filter, +Options) is det
RDF is a graph represented as a list of rdf(S,P,O) that describes URI.

This predicate is hooked by cliopatria:lod_description/2. The default is implemented by resource_CBD/3.

See also
- SPARQL DESCRIBE
  385lod_description(URI, RDF, _, _) :-
  386	cliopatria:lod_description(URI, RDF), !.
  387lod_description(URI, RDF, Filter, Options) :-
  388	option(bounded_description(Type), Options, cbd),
  389	echo_filter(Filter),
  390	rdf_bounded_description(rdf, Type, Filter, URI, RDF).
  391
  392echo_filter([]) :- !.
  393echo_filter(Filters) :-
  394	copy_term(Filters, Filters1),
  395	term_variables(Filters1, Vars),
  396	maplist(=(?), Vars),
  397	filters_to_ntriples(Filters1, NTriples),
  398	split_string(NTriples, "\n", "\n.\s", Strings0),
  399	maplist(insert_q, Strings0, Strings),
  400	atomics_to_string(Strings, "\n", String),
  401	base64(String, Encoded),
  402	format('Triple-Filter: ~w\r\n', [Encoded]).
  403
  404insert_q(String, QString) :-
  405	split_string(String, " ", "", [S,P,O|M]),
  406	map_q(S, QS),
  407	map_q(P, QP),
  408	map_q(O, QO),
  409	atomics_to_string([QS,QP,QO|M], " ", QString).
  410
  411map_q("<?>", "?") :- !.
  412map_q(S, S).
  413
  414filters_to_ntriples(Filters, String) :-
  415	with_output_to(
  416	    string(String),
  417	    rdf_save_ntriples(stream(current_output),
  418			      [ expand(api_lod:triple_in(Filters))])).
 mimetype_format(?MimeType, ?Format) is nondet
Conversion between mimetypes and formats.
  425mimetype_format(application/'rdf+xml',	xmlrdf).
  426mimetype_format(application/json,	json).
  427mimetype_format(application/'x-turtle',	turtle).
  428mimetype_format(text/turtle,		turtle).
  429mimetype_format(text/html,		html).
 format_suffix(?Format, ?Suffix) is nondet
Suffix is the file name extension used for Format.
  435format_suffix(xmlrdf, rdf).
  436format_suffix(json,   json).
  437format_suffix(html,   html).
  438format_suffix(turtle, ttl).
 lod_resource(+Resource) is semidet
True if Resource is an existing resource for the LOD server. Typically, this means it appears as a subject, but when considering symmetric bounded descriptions, it should certainly also hold for resources that only appear as object.
  448lod_resource(Resource) :-
  449	(   rdf(Resource, _, _)
  450	;   rdf(_, Resource, _)
  451	;   rdf(_, _, Resource)
  452	;   rdf_graph(Resource)
  453	), !.
  454
  455
  456		 /*******************************
  457		 *	       HOOKS		*
  458		 *******************************/
  459
  460:- multifile
  461	cliopatria:redirect_uri/3,
  462	cliopatria:lod_description/2.
 cliopatria:redirect_uri(+Format, +URI, -RedirectURL)
Compose a RedirectionURL based on the output Format and the URI that is in our RDF database. For example, this could map the URI http://example.com/employe/bill into Bill's homepage at http://example.com/~bill if Format is html. The default is to a format-specific extension to the path component of URI, returning e.g., http://example.com/employe/bill.rdf if the requested format is RDF.
Arguments:
Format- is one of xmlrdf, =turtle, json or html.
See also
- This hook is used by redirect/3.
 cliopatria:lod_description(+URI, -RDF:list(rdf(s,p,o)))
RDF is list of triples describing URI. The default is to use the Concise Bounded Description as implemented by resource_CBD/3.
See also
- This hook is used by lod_description/2
- library(semweb/rdf_describe) provides several definitions of bounded descriptions.