Review Comment:
Martin Brümmer: WOLD, WALS and IDS. RDF conversion and interoperability of linguistic datasets of the MPI EVA Leipzig
The paper describes the RDF conversions of three linguistic datasets of the MPI Leipzig. Despite numerous language issues (language needs to be substantially improved), it is an interesting and insightful data set description, and as it deals with a prominent set of resources from typology, it is certainly worth to be included in the special issue -- after the issues mentioned below have been addressed.
In section 2, the author criticizes the use of Literals for linguistic features. It may be worth checking whether the recently released TDS ontology (http://languagelink.let.uu.nl/tds/ontology/LinguisticOntology.owl, see, e.g., Saulwick et al. 2005; link and publication under CC-BY were announced last month by Menzo Windhouwer in personal email communication) provides a formal representation of the necessary information and provide a reference to these. Sebastian Nordhoff and colleagues from the MPI have initiated integration efforts for TDS, MPI data sets and other resources, and they may be consulted with respect to this.
Same section: "There was no Linked Data version or SPARQL endpoint available at the time of writing." If I recall correctly, the MLODE (workshop) organizers intended to provide an endpoint. The data is available, but at least, *efforts* to link the data sets should be mentioned. (And AFAIK, these are underway, for the state of development of late 2012, see Sect. 4 in Chiarcos, Moran et al. 2013. For more recent information, please double-check with Sebastian Nordhoff.)
As for sections 5.2 and 5.3, I wonder whether it would be possible for the final paper to use Glottolog language ids instead of or besides ISO 693 codes. As these were developed by the MPI itself, they should cover a greater portion of language identifiers than the ISO list. Also, they were specifically designed with the goal to address the owl:same issue mentioned in 5.2. If time permits and if the Glottolog development has progressed accordingly, he might consider extending the experiments accordingly. This is, however, only a suggestion.
Most importantly, the language needs to be improved, e.g., the very first sentence of the abstract could be restructured such that the resources are introduced earlier: "This paper describes the RDF conversion of three linguistic datasets of the Department of Linguistics at the Max Planck Institute for Evolutionary Anthtopology, their internal structure, as well as the semantic content." It's "Anthropology", of course (occurs multiple times), etc. In the following, I only mention language issues where they affect the understandability of the text.
page 1: "problems unique to Linguistic Linked Open Data (LLOD) will occur and tried to be solved" => "... are tried ..." (or, better, use active sentences). Use consistent spelling of "code-a-thon". Check hyphenization (and language settings): "devel-oped" ?
page 2: "code[2]. Instead, existing" => Instead of what ?
page 2: "The features themselves are modeled as a property" => multiple properties, I guess
page 4, 5 (and elsewhere): Typographical issues, e.g., boundaries on p. 4, line breaks on page 5, etc.
page 7: "The most basic concept in the domain of Linguistic Linked Open Data is the concept of language." => "One fundamental concept ..."
page 7: "as mentioned in ??."
page 7: "research and interoperability" => "research, and interoperability"
page 7: "Linguistic field research may disagree" => researchers, not research
page 8: "The points made in section ?? can not yet seen as proven,"
Minor comments:
- First paragraph should explicitly introduce the abbreviations used in the paper title.
- No references or links for OLiA and ISOcat. In the final paper, these could be cross-references within the special issue, I presume, but this needs to be counter-checked by the editors.
References:
Saulwick, A., M. Windhouwer, A. Dimitriadis, R. Goedemans (2005), Distributed tasking in ontology mediated integration of typological databases for linguistic research, In: Proc. 17th Conf. on Advanced Information Systems Engineering (CAiSE'05), Porto.
Chiarcos, C., S. Moran, P. Mendes, S. Nordhoff, R. Littauer (to appear 2013), Building a Linked Open Data Cloud of Linguistic Resources: Motivations and Developments, In: Iryna Gurevych and Jungi Kim (eds.) The People's Web Meets NLP: Collaboratively Constructed Language Resources, Springer. [manuscript can be requested from the authors, responsible author for the corresponding subsection is Sebastian Nordhoff]
|