Review Comment:
The paper is not broad enough to be called a survey article, as there are many aspects of this topic and many solutions that are not included. It is however a perfectly fine regular article proposing a means to deal with limitations in the OA model (and to some extent, NIF). I found it interesting and thoughtful, and I like the proposal of the two-layer model (although it should be made clearer in the introduction that this is a major focus of the paper).I feel the paper should be published, but suggest major revisions because of some fundamental problems that should be addressed first.
(1) In terms of being useful as an introductory text, there is quite a bit of assumed knowledge here. The general form of statements in OA and RDF should be explained. Also, some terms are introduced and not really defined, especially “aboutness” which appears repeatedly with no explanation of what it means besides that it is “the (broad) semantics of the connection”. It is also not broad enough to be called a survey article, as there are many aspects of this topic and many solutions that are not included. It is however a perfectly fine regular article proposing a means to deal with limitations in the OA model (and to some extent, NIF).
(2) A particular concern is a seeming lack of understanding of some of the proposed schemes for annotation and a resulting lack of awareness of their relationships. For example, in section 2.3, it is claimed that LAF “employs the representation language of XML” and goes on to assume that LAF is not a graph-based model. The distinction between a data model and a serialization is apparently not understood; LAF is a graph-based model which CAN BE serialized in XML, among many others, but it is also isomorphic to other graph-based models such as RDF. Interestingly the authors refer to a paper (Cassidy, 2010) that makes this very point within the same paragraph. Overall, the authors seem to be relatively unaware of a lot of work that is going on in the Computational Linguistics world (as opposed to the bioinformatics/BioNLP world) that embraces RDF-like models, and in particular the increasingly wide-spread use of JSON-LD (another way to serialize RDF) to represent linguistic annotations in projects such as the LAPPS Grid, Cassidy’s Alveo project, and interest in its adoption in major annotation projects such as DKPro and CLARIN.
(3) The point made about the problem of the OA model is an important one, and the preferred approach boils down (More or less) to the ability to provide different named relations (properties) instead of everything being cast as the target or body of an annotation—i.e., the inability of the OA model to handle what the authors call “multifaceted annotation”. It feels a bit like the paper is making more of this difference than is warranted, but at the same time, the fact that many people don’t seem to see this difference means it is probably worth discussion.
(4) There are some pieces of the puzzle that this paper attempts to address that I had hoped to hear more about but didn’t feel that the paper quite reached. The authors say on page 8 that “vocabularies with richer semantics have to be developed”. In the NLP world, this is a problem that several projects are addressing, but we need not only vocabularies, but also a model of which things are the so-called objects and which are the (named) relations. This may seem simple for cases such as the paper’s NP example or the association of a text span with an object in a database, but once one dives deeper there are tricky cases and sometimes, no right or wrong solutions but simply the need to make a consistent choice that everyone can live with. For example, is “part of speech” a property of a word (or token, which of course might not be a word)or is it an object in its own right? Decisions about this seemingly trivial distinction ultimately affect the ways in which the information is processed, so once you make a decision as to which you prefer, your software is wedded to a basic model that might be hard to adapt. Things become even more slippery for “relational” annotations such as coreference and temporal annotation that relate two words/tokens/text spans/entities, in terms of what is reified as an object and what is a named property.
(5) On page 6 the authors say “the type of annotation shown in Figure 6 is called a named entity grounding or normalization, which often means linking named entities in text with corresponding database entries”. This is made clear with the protein example, but could this not be the same for NP? If not why not? If so please make it clear.
(6) The English needs some work, notably, there are many places where articles are dropped (e.g., “OA model”) and some awkward phrasing. Some suggestions:
p.2 : “these two knowledge resources” — referent (in the preceding paragraph) was not clear to me at first
: “The practice of creating links between these two Webs is often referred to as annotation; in which some portion …” replace semi-colon with comma
: “as has been done in the CRAFT corpus of scientific literature, [20,4,49,21], as well as automatically, such as in the CALBC project for the scientific literature” > “as in the CRAFT corpus of scientific literature, [20,4,49,21], as well as automatically, as in the CALBC project for the scientific literature” (or use “for example” Instead of “as in”
p.3 : reference for LAF is “ISO 24612, 2012” not “ISO 2008”. The best paper citation is Ide, N., Suderman, K. (2014). The Linguistic Annotation Framework: A Standard for Annotation Interchange and Merging. Language Resources and Evaluation, 48:3, pp. 395-418.
p. 8 : “relevant resources should be able to associated without limitation” > “relevant resources should be able to be associated without limitation”
: “to implement the step 5” > “to implement step 5”
: “necessry” > “necessary”
: what does the “ex” prefix in “textspan:a-synuclein ex:refers_to uniprot:P37840 ex:anno3 .” mean or refer to? This is the kind of detail that would need explanation for those unfamiliar with RDF etc.
p.9 : “However, we are also free to group relevant statements into same graphs” > do you mean to say “However, we are also free to group relevant statements into common graphs” or “the same graphs”?
: TM annotation — what is this? I do not find it defined earlier in the paper.
: “Even with such a kind of annotation” > “Even with annotations like”
p. 10 : “does not make much sense if open world assumption is applied” > “does not make much sense if the open world assumption is applied”
: “Thanks to the separation, queries over the annotation, particularly those for annotation content become tidier” > insert comma after the word “content”
: “when searching for specific content of annotation” > “when searching for the specific content of an annotation”
: “In this section, we demonstrate it using the annotation example in Figure 5” > what is the “it” referring to?
p. 11 : “To make it compatible with OA model, however, we need a little bit of cost.” > “To make it compatible with OA model, however, we incur some cost”
: “ little bit of modification is required to OA model” > “ some modification of the OA model is required”
p. 11 : “we surveyed existing approaches to annotation representation in semantic web” > “we surveyed existing approaches to annotation representation in the semantic web” Also maybe say “we surveyed some existing approaches”
|