A Natural Language Interface to Semantic Web using Modular Query Patterns A complete presentation of the SWIP system

Tracking #: 963-2174

Camille Pradel
Ollivier Haemmerlé
Nathalie Hernandez

Responsible editor: 
Guest Editors Question Answering Linked Data

Submission type: 
Full Paper
Our purpose is to provide end users with a mean to query graph-based knowledge bases using natural language queries and thus hide the complexity of formulating a query expressed in a graph query language such as SPARQL. The main originality of our approach lies in the use of generic query patterns which are presented in this article and whose exploitation is justified by the literature. We also explain how our approach is designed to be adaptable to different user languages and emphasize the implementation of the approach relying on SPARQL. Evaluations on the QALD data set have shown the relevancy of the approach.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Mar/2015
Review Comment:

The manuscript describes a system called Swip to query RDF knowledge bases using Natural Language (NL) questions. In Swip, a Natural Language question is interpreted into a pivot query using named entity recognition and dependency parsing. Once a Natural Language question is converted into a pivot query, the pivot query is translated to a set of ranked SPARQL queries using a set of predefined query patterns. Finally, the ranked SPARQL queries are presented back to the user as possible structured queries corresponding to her original Natural Language Question.

While the approach taken in Swip is sound and intuitive, many of the details of the system is not clear. For instance, what are the limitations of using these predefined query patterns? The examples shown by the authors are all very simple queries consisting of few triple patterns at most. Can their approach handle complex Jeopardy like questions for instance? Moreover, the ranking function for the SPARQL queries with respect to the pivot query is very adhoc and the authors missed motivating many of its components.

Another major issue with the manuscript is that the authors missed citing a very relevant work by Mohamed Yehia in CIKM 2013 entitled "Natural Language Questions for the Web of Data". This work addresses the exact same problem and proposes a very sound and general solution for the problem. The authors should contrast their work against this and evaluate their systems compared to that work.

Moreover, the Swip system seems to be performing very poorly (and F-measure of 0.16) as the authors report on DBPedia. Improving the performance of the system and evaluating it thoroughly comparing it to the state-of-the-art is a must before the authors work can be presented.

Finally, the Swip system is available online, which is an excellent thing however the system is very slow in providing the list of SPARQL queries matching a given Natural Question even when tried out on very simple queries.

Review #2
Anonymous submitted on 19/Mar/2015
Major Revision
Review Comment:

The paper describes an approach for QA over RDF Data based on the identification and mapping of patterns in the data. The approach is well-formalized and evaluated using the QALD-3 test collection.
My main concerns of the paper are: (i) missing the discrimination of its contribution against important related work and (ii) the novelty angle.


- Well-formalized approach.

- The proposed model is evaluated using the QALD-3 test collection.

- Discourse clarity.


- The main limitation is on the novelty angle. It needs to be made clearer how the approach is discriminated from systems such as Treo in terms of query analysis and from TBSL in terms of pattern identification.

- Related work lacks important references on the state-of-the-art in the field.

- F1-Measure below state-of-the-art (minor point).


- Related work is outdated and it misses relevant references in the field. I recommend adding the references described in:

Unger, Freitas, Cimiano, Introduction to Question Answering over Linked Data, In Proceedings of the 2014 Reasoning Web Summer School, 2014.

In particular the authors need to include references to TBSL and Treo, systems which have commonalities with the proposed approach. This is an important limitation of the work as it fails to recognize similar approaches in the area.

- This statement needs to be better justified: "It facilitates the implementation of multilingualism by means of a common intermediate format"

- Copy and paste error: "This document provides instructions for style and layout of a double column journal article"

- The transformation of NL to a pivot query (question analysis) looks similar as the one described in:

Querying Linked Data using Semantic Relatedness: A Vocabulary Independent Approach. NLDB 2011.

There are differences, but they need to be made more explicit.

- Section 5.5 could be made more structured and less discourse oriented. For example, using bullet points, schematics and algorithms. This would improve the readability of the approach.

- In section 6.1 the reference to existing works [35-38] is disconnected from the identification of the patterns. How the conclusions from these works are used? What I get from this section is that the most important analysis id the manual pattern classification that the authors did. I couldn't find further details for the classification (methodology + data).

- The title uses the keyword `modular' to describe the query patterns, however the word is just used once in the text. I wonder if the modular attribute is really relevant to the work. In case this is not properly justified I suggest removing it.

- I found the interpretation process described by the set of ontologies the most novel aspect of the approach. I would encourage the authors to reflect that in the title and put more emphasis in this contribution.

- In Figure 13, I suggest rephrasing some steps to make them clearer. E.g. :

"Make progress the mappings of currently processed subpattern collections"

- The fact that the authors empirically identified that their approach works significantly better in a simpler and more homogeneous schema is a very positive point of the work.

Review #3
Anonymous submitted on 31/Mar/2015
Major Revision
Review Comment:

The paper "A Natural Language Interface to Semantic Web using Modular Query Patterns A complete presentation of the SWIP system" describes an approach for question answering over linked data. The first part of the paper presents the approach based on query patterns for natural language questions interpretation, and provides a detailed formalization of it. The second part of the paper describes the implementation (i.e. the Swip system) and the evaluation on the datasets of the QALD evaluation campaign.

The topic of the paper is relevant to the special issue of the journal. Being able to interpret a natural language question and map it to an appropriate query over structured resources is still an open issue (as can be seen by the performances of the systems that took part to the QALD1-5 evaluation campaigns), that deserves further investigation.
But if on the one side the proposed approach is described in detail and plenty of formalizations are reported, on the other side the experimental part is not up to the level of the first part (and of a journal paper). A huge machinery is described, that is evaluated with success on the MusicBrainz dataset only. The authors should better point out why there is such a gap between the conceptualization of the approach and its implementation. Quite a lot of work is still to be done to improve the system performances on DBpedia question set, both in terms of kind of questions addressed (so far, one-relation only?), and of the obtained results. I think that this part should be definitely improved. I am also wondering about the system portability, given the fact that the patterns are extracted from the training set questions, attempting to generalize them. But what is the cost to extract such patterns from a highly variable and huge resource as DBpedia? (how much manual work does it require?) In general, also the novelty of this contribution with respect to previous published papers on the same approach should be better highlighted.

Other comments:
- the paper should be proofread, there are several typos and formatting issues
- [Intro] the first sentence is there by mistake (I assume), therefore it should be deleted
- [Intro] the acronym NL is used before its definition
- [Intro] in general, given the fact that is a journal paper, I would have appreciated a richer and more detailed introduction, to better motivate and contextualize the work
- [section 3] the first part of the section (before definition 1) is pretty vague, I do not see the contribution of this section in this position of the paper (in general, it should be made clear which parts of the proposed approach are only formalized, and which ones have been implemented in the Swip system)
- [section 4.2] do the query element correspond to the Expected Answer Type? Or are the query types?
- [section 5.1] how are the gazetteers built? And how is the disambiguation process carried out if the name of one entity contains the name of another entity?
- [section 5.3] are the query focus rules manually written?
- [section 6.1] 11 major categories: which categories?
- [section 6.2] not clear to me how they map "the Cohen brothers" in the question, with the uri of Joel and Ethan Cohen (how is this disambiguation carried out?)
- [section 6.2] better use an example in which the obtained sentence is different from the original one.
- In general, it would be better to add more examples throughout the paper (that is sometimes hard to follow), in particular in the implementation part.
- [section 8.4] Figure ? (missing reference)
- as introduced before, the evaluation is the weak point of the paper, and should be improved. Which is the system runtime? Which is the cost of extracting patterns from a different KB? How many patterns are reusable across KB?
- why the system has been evaluated on QALD-3 data only?
- [section 9.4.1] why the new strategies to address aggregate questions etc. are applied and evaluated on the MusicBrainz dataset only?