Review Comment:
The paper describes a novel dataset, named lemonUby, as an integration of the UBY-LMF data model with the Lemon ontology. The consequence of establishing this conceptual mapping is establishing a common representation for a large set of language resources, both from the mainly Lemon-encoded Linked Open Data datasets, as well as terminology resources, encoded with LMF data categories. The authors establish the mapping on a class level by encoding several common lexicons in two languages. Besides providing the conceptual mapping, the dataset also contains several cross-dataset and cross-lingual instance mappings. Demonstrating that the approach can be used for multiple languages is also an important contribution.
The mapping transformation from UBY to Lemon is publicly available, thus making the dataset reproducible. The licensing is also clearly stated.
In order to provide motivation for users of the dataset, I recommend the authors to give some examples of use cases that this dataset enables: for instance, using the same codebase to use various different lexica as interchangeable modules, or providing a common ground for cross-lexicon alignment.
An important piece of information for users of this dataset would be the extent of the cross-dataset mappings: how dense are the links between the lexica, and what are the obvious gaps in connections that future researchers can fill. In line with my previous comment, this would also provide motivation for potential users of this dataset to use lemonUby as a starting point for generating cross-dataset and cross-lingual connections.
I also recommend outlining a strategy for future evolution of the dataset. As it stands now, the current goals are aligned towards representing annotated corpora with this model, but it would also be beneficial to provide directions for improvement of the dataset itself.
Minor points:
The abstract and conclusion state that the dataset is 'significantly linked'. If this characteristic has a specific definition (when is linkage 'significant'?), I recommend defining it explicitly.
Having a link to the dataset on the first page would make it friendly for readers.
=== Additions ===
The quality of the dataset sustains the quality of the source datasets and lexica due to well-defined transformations given the data model (available as XSLT files). Inspection of the available data revealed no errors in the mappings. The authors also discuss the limitations of their approach and justify where and why additional mappings were not possible.
The dataset has potential for use, but it should be more explicitly justified, possibly with examples. Within linked data, one example use case could be generating new cross-lingual and cross-dataset lexicon instance links using lemonUby for bootstrapping. Also, it should describe the contribution of the dataset from the perspective of terminology management.
The description of the dataset and the process used to construct it are both clearly specified and reproducible. The transformation file are also available.
|
Comments
Regarding the criteria "usefulness"
Hi all,
usefulness of the RDF conversion is one of the most interesting topics in my opinion. For this paper the opportunity is two-fold:
1. as Fabian mentioned, the data set is well established and it should be easy to pick one of the uses and mention it in the paper
2. even more interesting: since we have both now non-RDF and RDF version could you elaborate on the actual benefits ( and also disadvantages) of the RDF conversion? One of the most striking point here, in my opinion, are the off-the shelf tools you can use with RDF. E.g. how do you query LMF? Are there standard query languages for LMF?
This just came to my mind, I am sure you will also have your own bullet points to add for the usefulness of your data.
-- Sebastian
Re: Regarding the criteria "usefulness"
Hi Sebastian,
>> One of the most striking point here, in my opinion, are the off-the shelf tools you can use with RDF.
I agree, and it should be investigated what the additional benefits of using existing RDF-based tools are.
>> E.g. how do you query LMF? Are there standard query languages for LMF?
The non-RDF version of UBY is a Java version with a Java Object-Relational Mapping by means of the Hibernate framework, i.e., any instance of UBY-LMF is mapped either to a SQL database or to an XML file, and therefore, querying the database is straightforward (although not efficient) using Hibernate.
Best
Judith