Terminology and Ontology Development for Semantic Annotation: A Use Case on Sepsis and Adverse Events

Tracking #: 3118-4332

Melissa Yan
Lise Tuset Gustad
Lise Husby Høvik
Øystein Nytrø

Responsible editor: 
Guest Editors SW Meets Health Data Management 2022

Submission type: 
Full Paper
Annotations enrich text corpora and provide necessary labels for natural language processing studies. To reason and infer underlying implicit knowledge captured by labels, an ontology is needed to provide a semantically annotated corpus with structured domain knowledge. Utilizing a corpus of adverse event documents annotated for sepsis-related signs and symptoms as a use case, this paper details how a terminology and corresponding ontology were developed. The Annotated Adverse Event NOte TErminology (AAENOTE) represents annotated documents and assists annotators in annotating text. In contrast, the complementary Catheter Infection Indications Ontology (CIIO) is intended for clinician use and captures domain knowledge needed to reason and infer implicit information from data. The approach taken makes ontology development understandable and accessible to domain experts without formal ontology training. AAENOTE, CIIO, and their corresponding SPARQL queries used to answer competency questions are available at https://github.com/melissayan/aaenote_and_ciio.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Andre Lamurias submitted on 01/Jul/2022
Minor Revision
Review Comment:

This paper presents a terminology and ontology developed for sepsis and adverse events. The terminology AAENOTE was developed based on annotation guidelines used to annotate a synthetic corpus of Adverse Events texts. The ontology CIIO was developed based on the terminology. This paper expands a conference paper that had only the initial developments of the ontology. Although there is some overlap with that paper (mostly the annotation process), I think the submission has enough new material. This paper explains in a clear way each step of the process, from annotating documents to creating the terminology and ontology with feedback from clinicians. These new resources address the limitations of existing ontologies, which could not cover all the concepts necessary to describe AE documents and reason about catheter infections. This paper is relevant to the annotation of new documents and to the development of NLP tools for EHRs.

The data files are stored in a GitHub repository in owl format, along with the SPARQL queries and clinical knowledge used in the manuscript. The annotation guidelines of each session are also available on a different website (not essential to reproduce the results).

My concern is that the need for the terminology is not totally justified, it seems like it is only a middle point before creating the ontology. I understand that the terminology is to be used to annotate and represent documents, but it is not clear why an ontology could not be developed for that same purpose, and why a terminology is the answer to the limitations specified in the related work section.

some expressions, for example, "clinician identified sentences" could be simplified, or at least the addition of hyphens would help (clinician-identified sentences)
Although word sense disambiguation is mentioned in the related work, co-reference (maybe entity too) resolution should be mentioned since it would address the issues discussed in section 10.1 and 10.2, for example "Ng, Vincent. "Machine learning for entity coreference resolution: A retrospective look at two decades of research." Proceedings of the aaai conference on artificial intelligence. Vol. 31. No. 1. 2017.

Review #2
Anonymous submitted on 07/Jul/2022
Major Revision
Review Comment:

Paper summary
The paper describes a methodology for terminology and ontology development of annotated catheter-related and infection-related signs in Adverse Event (AE) documents which can be used in identifying events and reasoning about sepsis in an AE corpus.

Overall comments
The paper describes the development process for constructing a terminology that can represent an annotated corpus, as well as the development process for the terminology’s corresponding ontology, which represents domain knowledge and allows reasoning. The authors used a use case from the clinical domain, but the approach can be generalized to create terminologies from an annotation guideline for semantically annotated data.

The motivation for this research is clear, and the problem being addressed seems to be useful. I believe that the approach proposed for creating the terminology and ontology are of value for any domain, so I would suggest that the paper has a more general focus on how this could be done (explaining exactly the process of annotators, sessions, revisions etc), and then have a specific section explaining how this was achieved in the clinical domain for that particular use case. The way it is written now, it seems that it is useful only for that particular example.
In addition, it was not clear to me, why the developed ontology is of value for the sepsis example, and why for example the UMLS MetaMap is not good for doing this. I think it would be useful if the authors would make a comparison of their approach to the MetaMap for that particular use case, and identify pros and cons for each approach. For example, a discussion and comparison of this approach to other ones (when one fails, when succeeds etc. would be useful. ) Furthermore, I believe the evaluation also needs to be re-designed. There is no metric for the evaluation. What is the metric that the authors evaluate upon? It is not clear to me what the evaluation approach is and what are the results. I think authors should have used a new dataset and evaluate if the knowledge from the proposed ontology can be applied and expanded on the new dataset.
In addition, in its current form, the paper is somewhat hard to read. I would suggest that the authors put the figures and tables in close proximity to where they are mentioned in the text, because in most of the cases the referenced items are to 2, 3 pages later, so it is really hard to read the text and understand what the figure depicts at the same time. However, the authors have done an excellent job in submitting and organizing all the information needed in the provided resources.

Finally, I think that the paper would fit better in the “Descriptions of ontologies – short papers describing ontology modeling and creation efforts.” and not in the full paper track.

Review #3
Anonymous submitted on 10/Jul/2022
Review Comment:

One of my concerns is the relation of this article w.r.t. the author's previous works [1][2]. It seems that many contributing parts of this work are published in [1][2]. The authors have to clarify the differences of this work w.r.t. [1][2]. The article includes only one sentence to describe these differences.

A second major drawback of this work is the lack of a well-defined ontology development process. There a large number of well-know methods for developing ontologies, e.g., [3 – also check related work][4][5]. The authors need to describe the relations/adaptations w.r.t. to existing works. At least, the adopted process needs to be described clearly, by defining/describing the process’ steps.

Furthermore, the structure and the presentation of the article need considerable improvements (details are presented below).

Finally, since the major subject of this work is the definition of an ontology, I was expecting a more detailed and formal presentation of the ontology. For example, a part ontology hierarchy and class-properties graph. To note that, details regarding the ontology are available online at https://folk.ntnu.no/melissay/ontology/aaenote/index-en.html. However, details from there have to be included in the article.

As I previously mentioned, the presentation and the structure of the article need major improvements. In what follows, I list some of the issues.

The authors need to describe the scenario/setting and several basic concepts. A section similar to “II. Background” section [2] needs to be included. Furthermore, the challenges of this work have to described at the introduction section. Instead in the article, the challenges are mentioned at Section 3 “This lack of documentation makes it challenging to perform retrospective and real-time systematic…”.

Section 2. The second paragraph (“In the 2010 i2b2/VA workshop on NLP challenge…”) has to be removed from this section. This paragraph is not somehow related to this section.

Section 3.1 describes related works; you need to move to into Section 2.
There is major problem with the figure placement, the figures appear 2-3 pages after, e.g., Figure 2 (2 pages after), Figure 3 (3 pages after), Figure 4 (3.5 pages after), etc.

Section 5.1. the second paragraph “As shown in Fig. 1(b), documents were annotated by 8 annotators..” is not related to this section.

[1] M.Y. Yan, L.H. Høvik, L.T. Gustad and Ø. Nytrø, Understanding and Reasoning About Early Signs of Sepsis: From Annotation Guideline to Ontology, in: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021, pp. 1906–1911.
[2] M.Y. Yan, L.H. HøVik, A. Pedersen, L.T. Gustad and Ø. Nytrø, Preliminary Processing and Analysis of an Adverse Event Dataset for Detecting Sepsis-Related Events, in: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), 2021, pp. 1605–1610.
[3] York Sure, Steffen Staab, Rudi Studer: On-To-Knowledge Methodology (OTKM). Handbook on Ontologies 2004: 117-132
[4] Mari Carmen Suárez-Figueroa, Asunción Gómez-Pérez, Mariano Fernández-López: The NeOn Methodology for Ontology Engineering. Ontology Engineering in a Networked World 2012: 9-34
[5] Helena Sofia Pinto, Steffen Staab, Christoph Tempich: DILIGENT: Towards a fine-grained methodology for Distributed, Loosely-controlled and evolving Engineering of oNTologies. ECAI 2004: 393-397