Review Comment:
The presented article aims at providing a survey of models for the representation of several kind of languages resources, going under the common umbrella name of “Linguistic Linked Data”, touching several related topics such as FAIR guidelines (and compliancy of these models with FAIR requirements, or how they support this compliancy in datasets modeled after them), relationship with Digital Humanities, metadata targeted at linguistic assets and projects impacting development of LLD vocabularies and models.
The article is nicely written and well organized and, to my knowledge, is effectively widely covering the state of the art on the matter, providing indeed a great introduction to Linked Open Data in Linguistics to new, unexperienced, readers and a good compendium for experts in the field. The theme is also gaining more momentum and is surely of interest for the broader audience of the Semantic Web community.
Personally, I find the sections on FAIR unnecessary and quite detached from the rest. FAIR principles have surely have a large merit in that they extended the sensibility to openness, reusability, discoverability, interoperability to communities such as digital humanities, digital archives etc.. that have been not often close to the stream of innovation brought by the Semantic Web. Concretely, they added nothing to what SW standards, L(O)D policies and, more on a popular, disseminative, level, TBL’s 5-star path to open data all together did for guiding towards an enlightened publication of data. Shortly, we could say that there’s a thesis to be proven, i.e. SW stack of protocols and languages fully supports compliancy with FAIR principles. However, this is no way specific to data for linguistics nor should be proven in this survey. I understand though that, given the current global scenario and the separate identities (though with overlap) and diverse acknowledgements that FAIR principles and SW technologies have, it might be useful to restate the obvious. I thus just limit myself to point out this redundancy and leave to the authors the choice whether to reduce or all the content related to FAIR.
For an article having a title starting with “When Linguistics Meets Web Technologies” I would have expected more emphasis also on the technological stack that can enable the proliferation and use of LLOD, such as editing tools. It is not only a technical aspect, in that, as the authors themselves stress out in one paragraph, there are various stages in the evolution of the diffusion of given standards, which start from convergence towards common models, then production of data (in some cases, flooding the market with something that is not felt necessary, until its wide availability kickstarts the acknowledgement of it and fosters need for it) and finally real adoption. We are now in the “data production” phase, which is not strongly requiring editing systems as it mostly involves conversion of existing resources; however, it is through platforms that allow for a thorough analysis and development of resources that we can:
* discover resources that are not properly developed. If we just “fire off resources on the web” and we consider the task done by setting up a SPARQL endpoint, we might miss many issues in the data and, as long as the models behind them are still young, potential issues in the models that govern them as well. This is more easily spotted when the resources have to be properly loaded and read by a system that conforms to the same standards they conform to.
* develop a new generation that really exploits the full power given by these news standards, instead of adapting poorer information coming from legacy resources to modern (suite of) vocabularies such as OntoLex Lemon and LexInfo.
My personal experience with VocBench 3, which offers support for OntoLex, in the context of the PMKI project [2] which, among other things, included bringing a few resources to the modern OntoLex lemon model, has been that many resources developed in the context of other projects (GWA WordNet, IATE converted from the LIDER project, etc..) contained several (in some cases major) conversion bugs that made them unusable by OntoLex-compliant tools. Similar experiences came when we simply tried to host these resources through Ontolex-compliant publication tools, which led back to fix-and-retry iterations. While this is perfectly normal (it’s part of the lifecycle of a resource, and all findings have been contributed to the maintainers of the original resources or of the converters that produced their porting for OntoLex), it is in this “normality” that the role of development and publication platforms emerges.
I would thus suggest to dedicate a small section to this aspect, mentioning existing systems (not many currently if we consider OntoLex) like the already mentioned Lexvo, VocBench and others enabling editing and/or publication of LLD resources.
[1] Armando Stellato, Manuel Fiorelli, Andrea Turbati, Tiziano Lorenzetti, Willem Gemert, Denis Dechandon, Christine Laaboudi-Spoiden, Anikó Gerencsér, Anne Waniart, Eugeniu Costetchi and Johannes Keizer VocBench 3: A collaborative Semantic Web editor for ontologies, thesauri and lexicons, Semantic Web, doi:10.3233/SW-200370, 1-27, 05, 2020
[2] https://ec.europa.eu/isa2/actions/overcoming-language-barriers_en
Besides the few considerations above (which are not prescriptive), I think the article is already in a very good state and is almost ready for publication. I leave here a few more notes touching some specific points of the work and that can be easily dealt with.
TECHNICAL NOTES:
References have forms: pxcyrz, meaning: page x column y row z
p3c1r41: since registries are mentioned, maybe mention “reachability” / “discoverability” (or “findability”as it is called in the mentioned FAIR principles) as the mentioned qualities refer only to the use of domain vocabularies rather than the use of registries
p3c2r26: The description of the advantages of OWL for LLD models seems to evoke some mambo-jambo (i.e. unexplained) capabilities of OWL which I don’t think it possesses. While it is true that the shared semantics of OWL allow for a better axiomatization of terms (to paraphrase: to add further characteristics of them that are true in all possible interpretations of the logical term), the authors seem to stress the fact of being able to disambiguate the meaning of the terms, that is their interpretation, which is something that a logical modeling language does not do. The authors seem to hint at this aspect (talking in terms of limitations) in the sentence within round brackets in row 36. This should be however not a detail, rather the whole point being made.
p4c1r3: I think saying “OWL and PROV-O” is confusing. The described characteristics belong to PROV-O alone. Possibly what the authors mean is that OWL has the advantage of being a general purpose KR language binding then all those models (modeled in turn after it) under one common umbrella. This is something that is missing from other specific (and precedent) initiatives and could be stated as a different paragraph. Putting OWL and PROV-O together in that sentence is too much a simplification. Furthermore, I could possibly miss something in PROV-O but the way the authors said that allows to specify “whether we are describing an hypothesis or not” might suggests that PROV-O may inject some modal support into OWL (I. e. anything described in OWL can then be framed into a “hypothesis” dimension, separated from explicit facts) , which is not the case. PROV-O may simply offer the possibility to describe events, process… and hypotheses.
p10c1r42. “The latter has been described above”. Lime has been actually “previously introduced”, while its description follows in the dedicated section 4.3.2
p11c1. It might be worth highlighting that Lexinfo 3.0 is the first version that is compliant with Ontolex-Lemon
MINOR REMARKS and TYPOs:
Abstract
Complement - - > complements
p2c1r29: Not sure “build upon and complement” may be intended as attached to the auxiliary “will”, so not sure if if the auxiliary needs to be repeated for them or if they require the “s” for third person or if the expression can be left as is
p12c2r38. Converge in also in - - > converge also in
p28c1r32 so this is allows a compact : either “this is a compact” or “this allows for a compact”
Fig. 4 exceeds the first column, overlapping with the second column
|
Comments
errata corrige to my review
LexVo (mentioned as editing tool) is actually LexO (currently LexO-Lite)
I would add also Evoke (paper currently under review here: http://semantic-web-journal.net/content/evoke-exploring-and-extending-le...)
We will take the errata into
We will take the errata into consideration in our revised version