Review Comment:
This survey provides an overview of the current state regarding the treatment of multilinguality in the Linguistic Linked Open Data Cloud. It focuses in particular on foundations and recent developments with respect to the following levels of description: lexical semantics, pragmatics, lexicography, etymology and diachronicity, translation and terminology,
Overall, this is a very diligently conducted systematic review providing a synthesis of the state-of-the-art in dealing with multiple languages on the LLOD. The review has been very thoroughly conducted following the PRISMA approach and includes a vast amount of references and pointers to relevant work. This review has the potential to become the reference for researchers wanting to get an overview of recent work on multilinguality in the context of the LLOD.
I have two major comments on the article and a number of minor / stylistic points:
Major points
It would be good if the authors could add a glossary to the paper in which they explain the linguistic terms used in the text and that might not be known by a general reader of the journal. Examples are discourse, etymology, diachronicity, typologies, inflectional morphology, etc. This could be done as part of an appendix
In Sections 4.1 - 4.7 it would be good to close the sections with a summary of the key standards and methods that are available and a summary of the problems / questions that are still open so that the reader has a clear take home message for each section. As the sections stand, it remains unclear what problems are solved and which ones are still open or unaddressed at the respective level of description. The authors could add sth. like "In summary, X and Y have been addressed successfully / there is substantial work on X and Y, but Z is still open / insufficiently addressed.
A general comment: the article at multiple places including in the abstract mention that "speakers" are affected by the discussed challenges. I have doubts whether it is helpful to talk about "speakers" in general as the common speaker might not even be aware of LLOD nor about the fact that the own language is under-ressourced. The authors could be more precise in terms of who exactly is affected by the challenges they discuss.
Minor points
In general, the authors are using references as part of a sentence which is bad style, I point to the relevant places below.
Page 2 “Introduction”
The 1st paragraph of the introduction is quite unfocused and not really easy to digest / read. A lot of things are mixed and mentioned that are not really related to each other. The authors talk about the fact that language shapes interaction, that they also “conceptualize” the world (this is wrong IMHO as only agents but not languages can conceptualize sth.). They talk about “language pluralism”, “pressure by major language”, “linguistic relativity”. etc.
The 1st paragraph should be rewritten to have a clearer focus and motivation for the work presented. As it stands, the first paragraph is an enumeration of several aspects of languages that do not form a coherent view or perspective. It is btw. puzzling to use “Furthermore” in “Furthermore digital language data” as there is no obvious connection to the previous sentence which talks about pluralism and cultural heritage. The authors should have a more focus introductory set of sentences that does not need to refer to such a breath of concepts / aspects of language.
Page 3 bottom / Top of page 4
The sentences are quite long and should be shortened. In general, the article tends to use longer sentences spanning 3-4 lines oftentimes. Shorter sentences of 2-3 lines should be favoured to facilitate reading and understanding.
“will to link” sounds weird, “desire to link” sounds better to me.
The following sentence / claim is puzzling:
“As a result of these trends *comma missing* we find ourselves today in a situation where the semantic layer is no longer the only bridge between languages. Translations are, in principle, possible via the linguistic layer, …”
This is not clear as translation is inherently a semantic task; I simply do not understand what the authors are implying here. This needs to be clarified.
Page 5
I can not follow really why static vs. dynamic is a reasonable distinctive criterion between language resources on the one hand and services and tools on the other. Services and tools typically compute an output from an input but that does not make them necessarily dynamic as the computing function can stay the same over time. Language resources also might evolve over time with new texts being added etc.
The definition of “knowledge-based structures” is not a good one IMHO. It talks mainly about what knowledge-based structures are *not* (natural language words) but fails to give a good positive definition or examples.
Section 2.3
“It is a truism” => this is not a scientific statement, nothing in science is a truism.
“there may be several possible kinds of connection” => connections
2.3.1.
"Ultimately, it has to bottom out in the association of an entity of some kind with a universally accepted language label."
=> Not clear what the authors mean here; to which “some kind” of entities does this apply to? What is a universally accepted language label ?
2.3.2.
mini-language corpora; it is quite a stretch to call the input to a service a “mini-language corpus”. Corpora represent a collection of texts or other linguistic materials that are assembled together for some purpose. They are intentionally created artifacts that have a purpoSE and deliberate choices. are made regarding what to include and what not in the corpus. Calling the text input to a service a “corpus” is in this sense a stretch and unnecessary IMHO.
2.3.3.
I have not heard before that a “proposition” can be seen as a knowledge structure. In which sense of a “proposition” is this the case?
What do the authors mean with: “However, that connection is less direct than for a string”. Please elaborate.
In general, conceptual structures are linked to language not only to make the concept understandable, but to ground the concept in some symbol system that has already a meaning. Otherwise it would be difficult to define / express the meaning of the concept without making reference to an existing system of reference (language). This aspect could also be highlighted.
Section 2.4.
Who is “they” in “but they emphasize in addition two key points, …”
Page 13
DBpedia is written as DBPedia on the same page, please be consistent. DBpedia is the right spelling btw.
syntactic information is provided by [57]. => reference is used as sentence element
“Several future, additional features that should be addressed” => ungrammatical start of sentence, not clear
Page 14
Sentence: “This general line of research from work on ontology-based parsing”... the verb seems to be missing from this sentence…”
Page 15
“the area is generally suffering from” => suffers from
What is PDTB ? This is not explained / described.
"Bosque-Gil et al. discusses" => discuss
What is bidix? This is not explained / described.
Page 16
Top: “propose in [109]” => reference used as sentence element
“such as cuneiform signs in LLOD should be considered” => no comma
Diatopic-diachronic as well as diatopcy-synchronic representations of languages is one description” => “are” ???
Page 17
“Phonetics studies… Phonology studies”. Repetition at the beginning of both sentences. This is suboptimal from a stylistic point of view.
role as in [23] => reference as sentence element
the method proposed by [122] => reference as sentence element
which LIDIOMS [126] introduces by means of ontolex and vartrans => which are introduced in LIDIOMS by …
Page 18
as LLOD described in Lewis => as LLOD as described by Lewis ???
Furthermore, in terminology and translation *comma* varying degrees
in order to allow *for* a cross-resource analysis
Page 20
demonstrated the applicability of *the* Multilingual Morpheme Ontology
Page 21
presented in [159] => reference used as sentence element
DBpedia project consists => The DBpedia project consists …
In [160] => reference used as sentence element
In [162] => reference used as sentence element
Page 22
interlinking high-quality government data via *no the* RDF and SPARQL
in [174] => same as above
Such moderated repositories enables => enable
Add enumeration to the different annotation scheme levels in the paragraph on “Linguistic Data Categories”
Page 23
Autom ted Similarity Judgement Program => blank instead of “a”
GLOTTOLOG is written uppercase and lower cases as “Glottolog” on the same page, please be consistent throughout the article
Page 26
PHOIBLE … 2.000 language. Start with new sentence: “However, …”
communal base => common base?
Page 27
"the use of lOD for research …. require" => requires
From the perspective of conceptualisation, another issues arise => other issues arise ?
The TIAD task has being beneficial => been beneficial ?
Section 6.6.
There are a number of works in the scientific literature that clearly illustrate**. => no "s"
“which contrasts with the still low adoption” => “contrasts” is not the right word I think.
|