lemonUby - a large, interlinked, syntactically-rich lexical resource for ontologies

Tracking #: 515-1714

Judith Eckle-Kohler

Responsible editor: 
Guest editors Multilingual LOD 2012 MSS

Submission type: 
Dataset Description
We introduce lemonUby, a new lexical resource integrated in the Semantic Web which is the result of converting data extracted from the existing large-scale linked lexical resource UBY to the lemon lexicon model. The following data from UBY were converted: WordNet, FrameNet, VerbNet, English and German Wiktionary, the English and German entries of Omega-Wiki, as well as links between pairs of these lexicons at the word sense level (links between VerbNet and FrameNet, VerbNet and WordNet, WordNet and FrameNet, WordNet and Wiktionary, WordNet and German OmegaWiki). We linked lemonUby to other lexical resources and linguistic terminology repositories in the Linguistic Linked Open Data cloud and outline possible applications of this new dataset.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Vojtěch Svátek submitted on 30/Nov/2013
Minor Revision
Review Comment:

The paper has been substantially rewritten and now looks much better from the readability viewpoint. Morever, a section on possible applications of lemon-Uby has been added, which can be used as simple guidance for its usage.

I still miss some figures on heuristic (lemma-based) mapping precision in Section 5.2, based on a manually evaluated random sample. Table 2 also only measures the recall from the side of lemonUby and not from the side of WN2.0 and Wiktionary.

Minor comment: You say in the end of Sect. 4: „“Equivalent” translations are mapped as a datatype property on the sense…“. Just to be sure: you mean translation to other human languages, and the value of the datatype property is merely a string containing the lexical item in the other language? Perhaps you should even name the property.

In summary, when evaluating the paper along the 3 main axes for dataset papers: (1) the dataset quality is essentially derived from that of the underlying resource, UBY, however, the quality of the heuristically built external mappings still deserves at least lightweight validation, as suggested above; (2) the usefulness is, in my opinion, sufficiently advocated by the newly added section; (3) the clarity and completeness of the dataset description has also improved and looks satisfactory to me, given the space constraints of this kind of paper.

Review #2
By Tadej Štajner submitted on 10/Jan/2014
Review Comment:

The points I had raised in my previous review have been adressed sufficiently. The motivation is clearer, and the authors also include a specific 'Applications section'.

(1) Quality of the dataset.
The dataset is a well-documented integration of existing high-quality datasets.

(2) Usefulness (or potential usefulness) of the dataset.
The introduction justifies the usefulness as an important piece of the knowledge extraction and lexicalization workflow. This is further emphasized by the fact that it's under a permissive licence and integrated into the LLOD, which is a position with high potential for further use cases.

(3) Clarity and completeness of the descriptions
The paper describes a quite complex ecosystem of ontologies, standards and datasets, and explains the relationships between them, using them to motivate and position the contributions of the paper. The structure of the paper is also improved - it separately describes each of datasets, as well as the mappings on the data category level, as well as on the instance level.

Minor points:
'lemon'/'Lemon' capitalization is not used consistently - if it's meant to be strictly lower case, the section 2 title should also have it lowercased.
Section 6: when mentioning the SPARQL endpoint, a footnote URL or a section reference would help there.