Querying Biomedical Linked Data with Natural Language Questions

Tracking #: 1096-2308

Authors: 
Thierry Hamon
Natalia Grabar
Fleur Mougin

Responsible editor: 
Guest Editors Question Answering Linked Data

Submission type: 
Full Paper
Abstract: 
A recent and intensive research in the biomedical area enabled to accumulate and disseminate biomedical knowledge through various knowledge bases increasingly available on the Web. The exploitation of this knowledge requires to create links between these bases and to use them jointly. Linked Data, SPARQL language and interfaces in Natural Language question-answering provide interesting solutions for querying such knowledge bases. However, while using biomedical Linked Data is crucial, life-science researchers may have difficulties using SPARQL language. Interfaces based on Natural Language question-answering are recognized to be suitable for querying knowledge bases. In this paper, we propose a method for translating natural language questions into SPARQL queries. We use Natural Language Processing tools, semantic resources and the RDF triples description. We designed a four-step method which linguistically and semantically annotates the question, performs an abstraction of the question, then builds a representation of the SPARQL query and finally generates the query. The method is designed on 50 questions over 3 biomedical knowledge bases used in the task 2 of the QALD-4 challenge framework and evaluated on 27 new questions. It achieves good performance with 0.78 F-measure on the test set. The method for translating questions into SPARQL queries is implemented as a Perl module and is available at http://search.cpan.org/~thhamon/RDF-NLP-SPARQLQuery/.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Mariana Neves submitted on 14/Aug/2015
Suggestion:
Minor Revision
Review Comment:

The manuscript presents an approach for converting natural language questions to SPARQL queries using natural language processing techniques. The authors use a dataset of biomedical questions made available during the QALD challenge for developing and evaluating the system, which obtained good results and the second position in the challenge. The system is available for downloading.

Major review:

- The motivation of the paper is not clear. What are the shortcomings of the current solutions?

- I have the feeling that too much details is sometimes given on the external tools, e.g., Yapex, but not on the regular expressions and patterns you created. More details could be added to the manuscript or to a supplementary material.
- Page 7:
"Words expressing the negation (e.g. no in Figure 4e) are identified through regular expressions.";
"We also collect and identify quantification expressions in the questions processed."
"The post-processing first aims at selecting those entities and semantic types that may be useful for the next steps"
"define rewriting rules and to adjust (modification or deletion) semantic types associated with a given entity according to its context."
"supplementary disambiguation of the annotations is also performed"

- Maybe each of the main steps in Figures 1 could be briefly described (one sentence). For instance, it was not clear to me the difference between query construction and generation. Further, I find it sometimes difficult to map the steps in the figures 5 and 7 to their description in the text.

- Adaptability vs. over-fitting of the approach: 44 rules derived from 50 questions seem too much for me. How do you compare this number too previous approaches? Also, I wonder how the query created for the question related to the numbers of chromosomes could be used in any other situation. In the discussion, you state that the system "is also portable to new datasets" and I wonder whether this is indeed true.

- Finally, given that one system got almost 1.0 for all measures, I miss more discussion and comparison to the this system (is it cited in the related work?) and I wonder whether this is a solved problem, or whether the test dataset is too short for any conclusion on that.

Minor review:

- Page 2: Please include a reference to this passage: "However, typical users of this knowledge, such as physicians, life-science researchers or even patients, cannot manage the syntactic and semantic requirements of the SPARQL language neither can they manage the structure of various knowledge bases."

- Page 2: The types of linked data are not clear in this passage, could some examples be given on that? "Another important distinction is related to the types of linked data which are processed (typically, general [5][15][31] or specialized [1] languages KBs)"

- Page 2: The name of the toolkit could be included in the text: "a multilingual toolkit available for 36 languages"

- Page 3: last paragraph of the Related Work section. It is not clear whether it refers to the last system which was described.

- Page 2/3, Related Work: How do you compare and explain that one of the systems uses only "four modular query patterns" or "twelve patterns" while the last system relies on "112 basic query patterns"?

- Page 3, 2.2: In the following passage, it sounds to me like QA over text and not over linked data: "The evaluation is carried out on 100 questions and the corresponding queries are tested
on clinical documents."

- Related work: I am aware of another system (LODQA) on this field: http://www.lodqa.org/docs/references/

- Page 3: This sentence could be rephrased, it is not clear: "Several research questions can be related to the querying of linked data through the natural language interfaces."

- Page 5: Are the POS tags ever used in the approach?

- I find Figure 3 not necessary, it is only the output of the tool used for POS tagging. Maybe you could move it to supplementary material.

- Figure 4: couldn't it also include the information of which tool detected each mention? For instance, was "no" detected by NegEx?

- Page 7: typo "theyr"

- Page 7: I could not understand which taxonomy of the expected question types has been used here. For instance, the type "sideEffects".

- Figures 6 and 8: I wonder whether these two figures could be merged as the manuscript is too long. I also suggest showing the original natural language question, as well as explaining some acronyms/abbreviations/variables in the caption, e.g., "QT", "?v0" and making the columns of the table of the same size (right/left).

- Page 11: I think that the description of the semantic resources could be placed before the methods.

- Figure 9 could also be moved to supplementary material, it is huge and the manuscript is too long.

- Page 14: I do not find it nice to start a sentence with a number: "50 questions from the training test gather training and test sets of the QALD-4 challenge."

- Figure 10: I would include these examples in the text (as a list?) instead of a figure.

- Figures 11 is great but I wonder if I should be able to distinguish the 8 different steps. I can't, colours are too similar.

- Page 6: "descrive" -> describe

Review #2
By Anca Marginean submitted on 31/Aug/2015
Suggestion:
Minor Revision
Review Comment:

The new version addresses the reviewers' comments only to a limited extent.

The authors solved the problems with the example from Figure 3 and the corresponding explanations.

Yet, the originality of the work is still not clearly argued.
In the new version, the authors extended the Discussion section, but the comparison is focused mainly on performance metrics and less on the constituents (methods, algorithms, patterns, rules) of the compared systems.

The proposed method to generate SPARQL queries from natural questions is mentioned to be fully automated, as long as the rewriting rules are defined (section 7.4).
Is there a relation between the rewriting rules and the data schema? Can the rules be automatically derived from the schema, or at least part of them (for example for the properties which have String as values)? What about the patterns used for the extraction of terms from section 4.1? Are they domain-independent or do they need to be manually defined for each targeted domain?

Formal descriptions for the four steps of the used method are still missing. Consequently, some steps of the method seem unclear.
For example, in section 4.2 it is stated that "larger terms which do not include other semantic entities are kept". The decision to select the term "leads to" which includes "lead" which is annotated as an entity (drug) does not seem to be supported by the "larger terms" rule. The same goes for the elimination of "side effects of drugs" and "effects of drugs", unless the rule states that those larger terms are kept which do not include other semantic entities.

The writing needs also improvements. Some observations:
- section 2.2 - "We propose three such experiments" - could create the false impression that the experiments are performed by the authors
- section 4.2 "Therefore, semantic entities like lead is removed theyr are part of larger entities (lead to)."
- section 7.2 "provide the possibility to descrive the involved concepts and scenarii with more detail"
- the repeated occurence of the word "besides" might be avoided

Consequently, I recommend a carefull reading of the entire paper.

Review #3
By Jin-Dong Kim submitted on 24/Sep/2015
Suggestion:
Minor Revision
Review Comment:

The revised manuscript is much improved from the original version, particularly in terms of reproducibility.
Now I can see the details of the work and differences from other works.
I am not entirely convinced that the proposed methods are novel, though, but I admit that the presented work is thorough, including almost every details to perform the QALD task (the biomedical track).

It is in general a substantial work with a good amount of efforts.

Dispite of several remaining drawbacks, which I will point out below, now I think the paper can be a good contribution considering that natural language QA is still an underdeveloped area.

- (In the end of section 1) "Therefore, it is important to design friendly interfaces that ... natural langauge questions into SPARQL languages"
I feel it is a too much radical development of the argument. People also appreciate other types of interfaces, e.g. graphical authoring tools.

- (In the first paragraph of section 2) "In this work"
It sounds like referring to the presented work (the submission).

- (In the second paragraph of section 2.1) "50 questions are successfully transformed with the system"

50 questions out of how many questions?

- (In the second paragraph of section 2.1) "One advantage is that the method requires only four modular query patterns, while in a previous work of the authors twelve patterns were necessary."

Why fewer patterns is better?

- (in section 2.2) SESSA [20] does not take a natural langauge query, nor produces SPARQL queries. The system may not fall into the category of section 2.2.

- (in section 3) The first paragraph is really not convincing. The paragraph begins with "The objective of our work is to propose a novel ... method", but the remaining comments do not show anything novel. The only part that might be novel comes later: "we use information issued from the Linked Data resources to semantically annotate the questions and to define the frames.", but again I am not entirely convinced that it is novel.
I feel the authors are unnecessarily claiming that the proposed method is novel.

- (in section 4.2) how the frames are constructed? Isn't it also a specific resource to the target data set?

- (section 6) The current experimental results are not so much informative. The experiments really have to show how effective the presented individual modules were. For example, if the rewrite rules are removed from the system, how much the performance is degraded?

Review #4
By Christina Unger submitted on 25/Sep/2015
Suggestion:
Minor Revision
Review Comment:

Some remaining issues
---------------------

* The difference between related work in 2.1 and in 2.2 is not clear to me. In 2.2 you say: "Moreover, they allow to evaluate the final results (answers extracted from the KBs) and to provide precise evaluation figures." But this is true also for work mentioned in 2.1, because once systems have constructed queries, these queries can be executed to get answers. Maybe simply merge 2.1 and 2.2 into one section?

* Footnote 3: The link you provide is not persistent, please use http://sc.cit-ec.uni-bielefeld.de/qald/ instead (although it's not possible to point to QALD-3 Task 2 directly).

* The provided URLs (e.g. in Footnote 4 and 5) are ok, but the underlying links in the PDF are messed up due to the line breaks. Please fix this.

* In 4.2 you define the question topic as the type of semantic entity which is the major context of the question. It is not clear what you mean by "major context". Then, on page 8, you take the first semantic entity as question topic. This means first in a left-to-right sense?

* At the very end of the conclusion, you point to where the implementation and test questions can be found. I think that a much better place for this would be to put it in Section 7.4 (Reproducability).

Typos and suggestions
---------------------

Abstract:

* A recent and intensive research -> Recent and intensive research
* using SPARQL language. -> using the SPARQL language. or: using SPARQL.
* RDF triples description -> RDF triple descriptions
* Natural Language question-answering -> natural language question answering

Page 1:

* A recent and intensive research -> Recent and intensive research
* life-science bases -> life science knowledge bases
* In the enumeration (ClinicalTrials.gov, Sider, DrugBank), use commas instead of semicolons.

Page 2:

* cannot manage the syntactic and semantic requirements of the SPARQL language neither can they manage the structure of various knowledge bases -> can manage neither the syntactic and semantic requirements of the SPARQL language, nor the structure of various knowledge bases
(Actually, "cannot manage" is quite a strong claim; I would rather write something like: are usually familiar neither with the query language SPARQL nor the structure of various knowledge bases)
* mediate technical and semantic complexity --> I think you don't mean "mediate" but "lower".
* some related works -> some related work
* linked data -> Linked Data
* Please use \cite{paper1,paper2,paper3}, which will give [5,15,31] instead of [5][15][31].
* languages KBs -> KBs
* querry -> query
* One Question-Answering system (AutoSPARQL) is based on -> The question answering system AutoSPARQL
* dependent from -> dependent of

And I would suggest the following (non-)capitalizations:

* Knowledge-Based Specific Interface -> knowledge-base specific interface
* Question-Answering System -> question answering system
* Question-Answering system -> question answering system
* Natural Language interfaces -> natural language interfaces
* Kind -> kind, Entity -> entity, Property -> property, Relation -> relation

Page 3:

* manually-written grammar -> manually written grammar
* Questioning over Linked Data -> Querying Linked Data
* main objective of the related set of works is more complex than in works presented -> main objective of the following related work is more complex than work presented
* first -> First
* question the KB -> query the KBthis kind of works -> this kind of work
* a 0.62 precision -> a precision of 0.62
* activation of query graph -> activation of the query graph
* boolean -> Boolean
* retrieval of precise biomedical information in linked KBs -> retrieval of precise biomedical information from linked KBs
* [17], processing -> [17], or processing
* enriching question with -> enriching questions with
* define the frames -> define frames
* and to build the SPARQL queries -> build SPARQL queries

Page 4:

* see Fig. 1 -> see Figure 1 (You always write Figure, not Fig., so you should also be consistent here.)
* RDF triple description -> RDF triple descriptions
* generating the SPARQL queries -> generating SPARQL queries
* proposed by the task 2 -> provided by task 2
* and Sider described -> and Sider, described
* on the several questions -> using the following questions
* 4. Question translation -> 4. Question Translation

Page 5:

* see Fig.2 -> see Figure 2
* such as disease names, side effects -> such as disease names and side effects
* semantic entities recognition -> semantic entity recognition
* Here you print vocabulary elements (e.g. diseasome/disease/1154) using typewriter font, whereas in the rest of the paper you use bold face or italics (in Figure 4). Please stick to one. I would prefer typewrite font, but it's up to you.

Page 7:

* e.g. side effects drugs -> e.g. side effects of drugs
* expressing the negation -> expressing negation
* building the representation -> building a representation
* semantic annotation -> semantic annotations
* semantic entities like lead is removed theyr are part of larger entities -> semantic entities like lead are remove if they are part of larger entities
* On the whole -> In total
* Definition of the Result form: negated -> Definition of the Result Form: Negated
* coordination marks -> coordination markers
* boolean -> Boolean
* Identification of the Question topic: we -> Identification of the Question Topic: We
* corresponds the question topic -> corresponds to the question topic
* Arguments: we -> Arguments: We
* Scope of coordination: arguments -> Scope of Coordination: Arguments
* graph representation and abstraction -> graph representations and abstractions
* The objective of the query construction step is to associate previously identified elements --> Here something is missing, as you always associate something with something else. Or you don't mean "associate" but something else?
* to build representation of the -> to build a representation of the

Page 9:

Caption of Figure 6:
* You need to swap "left part" and "right part".
* displays graph representation -> displays the graph representation

Page 10:

* during the question abstraction: they concern -> during the question abstraction, as they concern (Or some other suitable conjunction...)

Page 11:

* In the enumeration, please use commas instead of semicolons.
* The predicates are associated between them through their subjects and objects --> This doesn't work. You associate something with something else, not between and also not through something. Please fix this.
* ?v0 which is -> ?v0, which is

Page 12:

* in Figure 8a: the -> In Figure 8a: The
* In 5., remove the line break.
* are processed: predicates -> are processed: Predicates
* the examplified questions -> the example questions (This occurs twice.)
* the result form SELECT -> the result form is SELECT
* each RDF triple and the filtering -> each RDF triple and the filters
* examples from Figures 9a to 9e and Figure 9g -> examples in Figures 9a-e and 9g
* SPARQL end-point -> SPARQL endpoint (also in Footnote 8)
* relies on: (1) the -> relies on (1) the
* ; (2) -> , (2)
* the three following -> the following three
* Footnote 8: This is also not a persisent URL, please simply leave it out, saying "For our experiments, we use the SPARQL endpoint provided by the QALD-4 challenge."

Page 13:

* Figure 9: You could consider using namespaces that you define in the caption, then the URLs and the queries would be easier to read.

Page 14:

* named entity recognition; -> named entity recognition.
* disease gene associations -> disease/gene associations
* At the end of all bullet points, please use "." instead of ";"
* drug-target relations -> drug/target relations (or change "disorder/gene associations" above in to "disorder-gener associations")
* subject predicate object RDF triples -> RDF triples of form
* for the Question Annotation -> for Question Annotation
* questions: in this way -> questions. This way
* IRIs -> URIs
* training test gather training and test sets -> Please fix this.

Page 15:

* 4 Gb -> 4 GB
* 2.7GHz -> 2.7 GHz (or write "4GB" above)
* running time -> run time (occurs twice)
* sub-steps -> substeps
* TermTagger which -> TermTagger, which
* in 2 seconds on the average -> in two seconds on average
* with 0.78 F-measure -> with an F-measure of 0.78
* for three questions, the -> for three questions the
* SPARQL end-point -> SPARQL endpoint

Page 16:

* were returned then. -> were returned.
* Comparison with Existing Works -> Comparison with Existing Work
* with the existing ones -> with existing ones
* for Precision, Recall and F-measure -> for precision, recall and F-measure
* exploits Grammatical Framework grammar based on formal syntax -> exploits a Grammatical Framework grammar
* POS-tagging -> POS tagging
* On the whole -> In general
* Similar works -> Similar work
* For instance, comparable approach -> For instance, a comparable approach
* provide the possibility -> provides the possibility
* descrive -> describe
* scenarii with more detail -> scenarios in more detail

Page 17:

* performance of automatic systems -> the performance of automatic systems
* in a previous work -> in previous work
* specific entities while -> specific entities, while
* to the general entities -> to general entities
* construction step: the -> construction step: The
* define regular expression -> define a regular expression
* calcium supplement while -> calcium supplement, while

Page 18:

* DailyMed, is also -> DailyMed is also
* relies on linguistic and semantic annotation -> relies on the linguistic and semantic annotation
* RDF triples description -> RDF triple descriptions
* with 0.78 F-measure -> with an F-measure of 0.78