Wordnet 3.0 in RDF

This RDF version of Wordnet 3.0 uses the following namespace:

http://purl.org/vocabularies/princeton/wn30/

Browsers requesting urls in this namespace will be redirected by purl.org to an HTML rendering of the requested resource. See wn30:synset-chair-noun-1 for an example. Semantic Web applications using the HTTP request header to explictly request application/rdf+xml will be redirected to a RDF/XML rendering of the symmetric concise bounded description of the resource. You can also override the request headers of your browser by adding a .rdf or .ttl suffix to the URL. See wn30:synset-chair-noun-1.rdf for an example.

Alternatively, you can browse or download the latest version of all source files directly from our git repository.

We're in the LOD cloud!

Jacco, September 25, 2010
More good news: Wordnet 3.0 is now also present in the famous LOD cloud graph! Many thanks to Anja and Richard!

Schema files being updated

Jacco, 21 May 2010

Mark, Antoine and I are busy cleaning up the schema files. This is not ready yet, but you can follow our progress on github.

Second beta release of Wordnet 3.0/2.0 mappings

Jacco, June 10, 2010

I've published my mappings from Wordnet 3.0 synsets to Wordnet 2.0 synsets in the wn20mappings folder.

Note: The mappings in this folder are not based on any Princeton sourcefile. All erroneous mappings are my responsibility, not Princeton's.

Mapping statistics and origin:

The mappings in this folder have been created in multiple steps. The result of each step is reflected in a separate file.

In the RDF version we have 117,657 Wordnet 3.0 synsets to be mapped.

Step one: detecting synsets with identical label and gloss (103,339)
I've detected 103,339 Wordnet 3.0 synsets with a unique one-to-one mapping to a Wordnet 2.0 synset on the basis of having both an identical label and gloss. I assume these synsets correspond. Note that this first step covers already around 88% of all synsets.
Results are in file: glossmatches-m.ttl
An additional twelve Wordnet 3.0 synsets where found which had a mapping to two Wordnet 2.0 synsets, based on identical gloss and label, and two Wordnet 3.0 synsets that where both mapped to the same Wordnet 2.0 synset
Results are in file: glossmatches-p.ttl
Since the mappings in the file above are ambiguous, we will ignore them in the following steps.
Step two: detecting synsets with identical label and strong family resemblences.
For all the 3.0 synsets not having a one-to-one mapping already, I've looked at 2.0 synsets that have identical labels and:
  1. Both have a matching broader and narrower synset in the hyponym that was already matched by an earlier step.
    Results are in file: label-childparent-matches.ttl (1,272/1,550).
  2. Only have a broader (based on hyponym, meronym or instance) match.
    Results are in file: label-parent-matches.ttl (3,396/3,682).
    Results are in file: label-meronym-matches.ttl (1,403/1,561).
    Results are in file: label-instance-matches.ttl (507/486).
  3. Only have a narrower (hyponym axis) match. Results are in file: label-child-matches.ttl (309/141).
  4. If non of the above applies, but a label occurs only once in wn30 and also only once in wn20 (within the same part of speech), we consider the corresponding synsets to match as well.
    Results are in file: label-unique-matches.ttl (1562/1200).
  5. If non of the above applies, but the labels match and the glosses are very similar, we consider the corresponding synsets to match as well.
    Results are in file: label-neargloss-matches.ttl (823/666).
  6. Before saving the above 3 results, we have removed the synsets for which this step three resulted in ambiguous alignments, and saved this ambigous mappings in a separate file. Results are in file:
    ambiguous-label-pc-matches.ttl (253/279).
Step three: rerun step two
We rerun step two multiple times to take advantage of the new mappings generated. Repeat until no new mappings are found (this was the case after three repetitions). The second number in the statics above shows the number on which this stabelizes.

Analysis of recall

This leaves us with 117657 - 103339 - 1550 - 3682 - 1561 - 486 - 141 - 1200 - 666 - 165 = 4869 unmapped synsets. These are in the file:
to_be_mapped.ttl (4869)

A quick manual inspection showed that many of these unmapped synsets are new senses of existing words. Improvements on the mappings will be posted on this site.

Alpha release of Wordnet 3.0 in RDF

Mark & Jacco, May 10, 2010

Today we publish the RDF version of Princeton Wordnet 3.0 as Linked Open Data in the following namespace:

http://purl.org/vocabularies/princeton/wn30/

Browsers requesting urls in this namespace will be redirected by purl.org to an HTML rendering of the requested resource. Semantic Web applications requesting application/rdf+xml will be redirected to a RDF/XML rendering of the resource.

About Wordnet 3.0 in RDF

The RDF files have been generated in a way similar to the method described by Mark van Assem et al. in RDF/OWL Representation of WordNet on the W3C site. In particular, the 3.0 version has a similar division between the basic and the full version. All software is available as open source from github.

Acknowledgments

This RDF version was based on Wordnet 3.0 as distributed by Princeton. The conversion software was written by Mark van Assem with additions from Jacco van Ossenbruggen. Part of work presented here was funded by the EU through the PrestoPrime and EuropeanaConnect projects.