lemonUby - a large, interlinked, syntactically-rich resource for ontologies

Tracking #: 404-1516

Authors: 
Judith Eckle-Kohler
John McCrae
Christian Chiarcos

Responsible editor: 
Guest editors Multilingual LOD 2012 MSS

Submission type: 
Dataset Description
Abstract: 
We introduce a new lexical resource integrated in the Semantic Web called lemonUby which is the result of a large-scale population of the lemon lexicon model. This was achieved by converting a number of UBY lexica standardized according to UBY-LMF to the lemon format: English WordNet, FrameNet, VerbNet, Wiktionary and OmegaWiki, and German Wiktionary and OmegaWiki. lemonUby is significantly linked – both at the sense level within its component resources and to other lexical resources and terminology repositories in the Linguistic Linked Open Data cloud.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Tadej Štajner submitted on 15/Feb/2013
Suggestion:
Minor Revision
Review Comment:

The paper describes a novel dataset, named lemonUby, as an integration of the UBY-LMF data model with the Lemon ontology. The consequence of establishing this conceptual mapping is establishing a common representation for a large set of language resources, both from the mainly Lemon-encoded Linked Open Data datasets, as well as terminology resources, encoded with LMF data categories. The authors establish the mapping on a class level by encoding several common lexicons in two languages. Besides providing the conceptual mapping, the dataset also contains several cross-dataset and cross-lingual instance mappings. Demonstrating that the approach can be used for multiple languages is also an important contribution.

The mapping transformation from UBY to Lemon is publicly available, thus making the dataset reproducible. The licensing is also clearly stated.

In order to provide motivation for users of the dataset, I recommend the authors to give some examples of use cases that this dataset enables: for instance, using the same codebase to use various different lexica as interchangeable modules, or providing a common ground for cross-lexicon alignment.

An important piece of information for users of this dataset would be the extent of the cross-dataset mappings: how dense are the links between the lexica, and what are the obvious gaps in connections that future researchers can fill. In line with my previous comment, this would also provide motivation for potential users of this dataset to use lemonUby as a starting point for generating cross-dataset and cross-lingual connections.

I also recommend outlining a strategy for future evolution of the dataset. As it stands now, the current goals are aligned towards representing annotated corpora with this model, but it would also be beneficial to provide directions for improvement of the dataset itself.

Minor points:
The abstract and conclusion state that the dataset is 'significantly linked'. If this characteristic has a specific definition (when is linkage 'significant'?), I recommend defining it explicitly.

Having a link to the dataset on the first page would make it friendly for readers.

=== Additions ===
The quality of the dataset sustains the quality of the source datasets and lexica due to well-defined transformations given the data model (available as XSLT files). Inspection of the available data revealed no errors in the mappings. The authors also discuss the limitations of their approach and justify where and why additional mappings were not possible.

The dataset has potential for use, but it should be more explicitly justified, possibly with examples. Within linked data, one example use case could be generating new cross-lingual and cross-dataset lexicon instance links using lemonUby for bootstrapping. Also, it should describe the contribution of the dataset from the perspective of terminology management.

The description of the dataset and the process used to construct it are both clearly specified and reproducible. The transformation file are also available.

Review #2
By Vojtěch Svátek submitted on 19/Feb/2013
Suggestion:
Minor Revision
Review Comment:

The topic is appropriate for a 'linked dataset' paper. The authors seem to have done solid realization work when converting a large existing lexical dataset to RDF and interlinking it with resources already present in the LLOD.

There is, however, still space for improvement. Most notably, I miss:
- A concrete proposal (if not tangible experience) on usefulness of the created dataset for a practical application, with added value compared to pre-existing datasets. There is only one sentence on lexicalizing relational knowledge using verbs, on the first page.
- Most of the paper is devoted to the process of creation of the resource. While this is in a way desirable, in the sense of giving credibility to the artifact, the description of lemonUby 'as is' is perhaps too parsimonious - just the second para of Section 4, and the brief statistics in Table 1.
- For a semantic web journal, the paper is too abundant in details that only lexicographers could possibly appreciate, at places. On the other hand, I miss concrete examples (real data fragments) illustrating the challenging aspects of the UBY - lemon mapping.
- As the process of building lemonUBY involved heuristic linking, I would expect some kind of precision analysis, even if on a small sample only.

In the title I somewhat miss the word 'lexical', such as '...syntactically rich lexical resource for ontologies'.

In the list of contributions in the end of Section 1, I don't understand what (i) refers to.

Typographically, the final published version should be improved. There are some weird sentences, missing/spare blanks, commas, mixing of capitalization (UBY vs. Uby), etc.

However, in principle the text is comprehensible.


Comments

Hi all,
usefulness of the RDF conversion is one of the most interesting topics in my opinion. For this paper the opportunity is two-fold:
1. as Fabian mentioned, the data set is well established and it should be easy to pick one of the uses and mention it in the paper
2. even more interesting: since we have both now non-RDF and RDF version could you elaborate on the actual benefits ( and also disadvantages) of the RDF conversion? One of the most striking point here, in my opinion, are the off-the shelf tools you can use with RDF. E.g. how do you query LMF? Are there standard query languages for LMF?

This just came to my mind, I am sure you will also have your own bullet points to add for the usefulness of your data.
-- Sebastian

Hi Sebastian,

>> One of the most striking point here, in my opinion, are the off-the shelf tools you can use with RDF.
I agree, and it should be investigated what the additional benefits of using existing RDF-based tools are.

>> E.g. how do you query LMF? Are there standard query languages for LMF?
The non-RDF version of UBY is a Java version with a Java Object-Relational Mapping by means of the Hibernate framework, i.e., any instance of UBY-LMF is mapped either to a SQL database or to an XML file, and therefore, querying the database is straightforward (although not efficient) using Hibernate.

Best
Judith