Review Comment:
This manuscript was submitted as 'Ontology Description' and should be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided). (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.
The manuscript describes a multilingual morpheme ontology called MMoOn Core (MC) intended by its authors to support the modelling and publication of morphological language data, and in particular morpheme inventories, as linked data datasets. MC is meant to be used both in the case of NLP-oriented resources as well as for linguistic datasets; it is language independent and its approach to creating language specific datasets is based on a specially defined layered architecture.
The submission starts by motivating the need for such a vocabulary/ontology and by relating MC to other vocabularies/ontologies in the domain of linguistic linked data and in particular to ontolex-lemon and ligt; this part of the submission also features a subsection on already existing morphological resources. This is afterwards followed by a domain analysis and subsequently a detailed description of the MC ontology is given, first its main classes, then its properties; the use of some of these is illustrated by an example. Next, a description of how language specific resources can be integrated into the architecture proposed by the authors is laid out. Other aspects of the ontology’s design are also touched upon and an extended discussion of the relationship between ontolex-lemon and MC is also detailed. The final section gives a number of possible use cases for MC.
The paper gives a convincing argument for the need for an LLD ontology/vocabulary of the kind described in this paper. However, I would have liked to have seen some discussion of or at least reference made to previous work in publishing morphological data as computational resources in non-Semantic Web contexts since this work does have bearing on the current case. In particular a comparison with the approach taken by LMF for both intensional and extensional morphological data would be useful here: especially since LMF is a format neutral model (insofar as it is described in UML and not in any specific serialisation format such as RDF) and was extremely influential on lemon (although, alas, not from the point of view of morphology).
Another thing which I would like to see better described and justified is the distinction between the classes Morph and Morpheme in MC. Morphemes are defined, in very many texts, as the smallest meaning bearing elements in a language whereas morphs are defined as essentially strings of phonemes that can represent one or more morphemes (in the MC case morphs can only represent one morpheme), and indeed morphs get their meanings in specific cases from these morphemes. In addition, affixes are usually described as kinds of morphemes and not morphs (in MC it is a subclass of Morph which means that in MC being an affix is not part of the meaning of a Morpheme). In the MC model the distinction between morpheme and morph is somewhat blurred and both morphemes and morphs can have meanings via the hasMeaning property. This means that in the player example given in the paper both the ‘er’ morph and its morpheme have the AgentNominalizer meaning. Since there doesn’t seem to be any constraint given to the use of hasMeaning for morphs and their corresponding morphemes, this could lead to differing interpretations of the model which might potentially make datasets that use it less interoperable than they might otherwise be. I’m also puzzled as to how both LexicalEntry and WordForm can both be subclasses of Word when they are two different kinds of conceptual entity (this is bad form in ontology modelling) -- and indeed in ontolex-lemon and LMF Word is a subclass of LexicalEntry (this for instance would make interconnecting MC with ontolex-lemon as proposed in Section 9 much more challenging) -- this choice needs to be motivated. Moreover the difference between Lexical Entry and Lexeme in Figure 2 should also be explained and made clear to users of MC since the two terms are often used interchangeably.
Overall I think some of the individual ontological decisions taken in the model, at least those pertaining to the main classes and properties, should be better explained: especially those which seem to differ from how numerous other sources (especially in the domain in morphology) define them. An expansion of Section 5 including descriptions of some of the other classes in the ontology (e.g., Lexeme, LexicalEntry, and those whose definition might not be immediately obvious to non linguists) and better descriptions of the classes already featured (especially MorphologicalRelationship) with some more illustrative examples, would make this paper much more useful; something similar should also be done with the main properties in the model. It would also make the article much better suited to the ontology description submission track.
In addition there are numerous errors in English/general typos to be found throughout the paper. For instance the first line reads “Morphological language Data (MLD) has ever since played a crucial role across various interdisciplinary research fields”...ever since what? The first paragraph also mentions “large text amounts” instead of “large amounts of text”. The paper would definitely benefit from being proofread by a native speaker of English.
|