Online paper appendix: data replicability

This page is still under development but the final goal of this web page is to allow anybody to replicate all data discussed in our TPDL2011 paper: Interactive Vocabulary Alignment

abstract: In many heritage institutes, objects are routinely described using terms from predefined vocabularies. When object collections need to be merged or linked, the question arises how those vocabularies relate. In practice it often unclear for data providers how well alignment tools will perform on their specific vocabularies. This creates a bottleneck to align vocabularies, as data providers want to have tight control over the quality of their data. We will discuss the key limitations of current tools in more detail and propose an alternative approach. We will show how this approach has been used in two alignment use cases, and demonstrate how it is currently supported by our Amalgame alignment platform.

Data used

All four vocabularies that are aligned in the use cases are available as linked open data. Their current versions can be downloaded from CKAN. For use case 1, see:

For use case 2, see

For easy loading of all four vocabularies in the exact versions we used during the experiments described in the paper, you can use this snapshot of our triple store with all vocabularies pre-loaded.

Software used

All software used for the use cases described in the paper is open source, available online and runs, at the time of writing, on Linux, MacOs and Windows. All software used is also under git version control, so you will need git to install the software. Git also allows you to rerun our experiments on exactly the same software versions as was used to generate the data below. The alignment software is Amalgame, which is a software package of ClioPatria, the Semantic Web application platform of SWI-Prolog.

To replicate the data, you thus need a recent installation of SWI-Prolog. You can either download the latest development version binary, or download a snapshot of the source code of the exact same version (V5.11.28) as we used during the experiments. An even better alternative is to clone the development repository (see instructions) and use git to revert to the version we used (git checkout V5.11.28).

Once you have Prolog running, you need to install ClioPatria. You get the latest version by default, but can quickly revert to the Cliopatria version V2.9.0-106-gecbd221 that we used (just run: git checkout ecbd221).

Once you have ClioPatria running, installing Amalgame is easy. Just run cpack_install(amalgame) on the interactive Prolog prompt of a running ClioPatria server. This will again get you the most recent version (in folder cpack/amalgame/). You can revert back to the "tpdl2011" version by running (in the amalgame folder): git checkout tpdl2011
. You now have an exact replica of the software that we used during our experiments. Below follows a complete table of all software loaded and their versions:

GIT moduleVersion
ClioPatriaV2.9.0-106-gecbd221
amalgametpdl2011
foafepoch-6-ge0db719
opmve5a0697
skos0ed03c1
void81ef5d1
yui3508fc5b

Data created in use case 1

The GTAA/Cornetto alignment created in Use Case 1 was created during an interactive session. Like solving math problems, there is no easy substitute for the insights gained when you are actually trying to create the alignment yourself. With the help of the issues discussed in the paper, we hope you will able to use Amalgame to create your own alignments, alignments that fulfill the needs of your applications.

If you prefer, you can also replay the strategy that was the result of the interactive session described in the paper. To do so, you can use Amalgame's import feature to import the strategy graph in RDF/Turtle. Alternatively, you might want to view the strategy graph in SVG. Note that arrows denote dependencies, so the direction of the data flow is from bottom to top. The green nodes in the graph are final results and are resolvable: clicking will download the associated Turtle file with the actual mapping data.

Creative Commons License
All resulting data is licensed under a Creative Commons Attribution 3.0 Unported License. The eight mapping files that make up the entire alignment discussed in the paper are also available in RDF/Turtle format, and are described using the VOiD vocabulary in the void.ttl file. The execution trace of the strategy has been recorded and encoded using the Open Provenance Model Vocbulary (OPMV) in the opmv.ttl file.

Data created in use case 2

The Wordnet 3.0 to 2.0 mapping created in Use Case 1 has also been created interactively, and we strongly invite you to give it a try yourself. Again, your can replay the strategy discussed in the paper by importing strategy graph in RDF/Turtle or view strategy graph in SVG

To speedup the label matching, we have removed all redundant rdf:label triples from the WordNet synsets. In the case you use the triple store snapshot described above, this has already been done. In case you use the public downloads, you can remove the label triples by executing the following commands in the prolog shell:

rdf_retractall(S, rdfs:label, L, 'http://purl.org/vocabularies/princeton/wn30/wordnet-synset.ttl.gz').
rdf_retractall(S, rdfs:label, L, 'http://www.w3.org/2006/03/wn/wn20/instances/wordnet-synset.rdf').

Creative Commons License
All resulting data is licensed under a Creative Commons Attribution 3.0 Unported License. The seven mapping files that make up the entire alignment discussed in the paper are also available in RDF/Turtle format, and are described using the VOiD vocabulary in the void.ttl file. The execution trace of the strategy has been recorded and encoded using the Open Provenance Model Vocbulary (OPMV) in the opmv.ttl file.

Acknowledgments

We thank W. van Hage, A. Isaac, C. Reverté Reverté, A. Tordai and J. Wielemaker for their feedback and help in the development of Amalgame. M. van Assem produced the RDF conversions for WordNet 2.0 and 3.0. Part of work presented here was funded by the EU through the PrestoPrime and EuropeanaConnect projects.