Question Answering on RDF KBs using Controlled Natural Language and Semantic Autocompletion

Tracking #: 1678-2890

Giuseppe Mazzeo
Carlo Zaniolo

Responsible editor: 
Guest Editors ENLI4SW 2016

Submission type: 
Full Paper
The fast growth in number, size and availability of RDF knowledge bases (KBs) is creating a pressing need for research advances that will let people consult them without having to learn structured query languages, such as SPARQL, and the internal organization of the KBs. In this paper, we present CANaLI, a Question Answering (QA) system that accepts questions posed in a Controlled Natural Language. The questions entered by the user are annotated on the fly, and a KB -driven autocompletion system displays suggestions computed in real time from the partially completed sentence the person is typing. By following these patterns, users can enter only semantically correct questions which are unambiguously interpreted by the system. This novel feature enhances the interaction with and the usability of the CANaLI which also delivers a high level of accuracy and precision. In experiments conducted on well-known QA benchmarks, including questions on the encyclopedic DBpedia and on KBs from specialized domains, such as music and medicine, CANaLI typically outperforms other QA systems.
Full PDF Version: 

Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 12/Aug/2017
Major Revision
Review Comment:

The paper presents Canali, a system for answering NL questions over KBs, supporting auto completion. The overall approach is based on automata, trying to determine user intentions and suggest relevant content, guiding the formulation of valid questions that match the underlying KBs.

“This novel feature enhances the interaction with and the usability of the CANaLI which also delivers a high level of accuracy and precision” => This novel feature enhances the interaction with and the usability of CANaLI, demonstrating at the same time high accuracy and precision.

“In this paper, we provide a detailed presentation of the controlled NL parsing techniques that led that success and also enabled support for real-time question completion idea greatly improves the usability of CANaLI and the interactive experience users have with the system” => In this paper, we provide a detailed presentation of the controlled NL parsing techniques that led that success and also enabled real-time question completion, greatly improving the usability of CANaLI and the interactive experience users have with the system.

“CNL approach” ?? Canali Natural Language approach ?

“After a short description of SWiPE, ClioPedia and SWiPE” => After a short description of SWiPE, ClioPedia and Xser?

Introduction needs to be restructured. In page two, the paper suddenly starts describing related work, which I think it is better to include it in section 6.

One of the positive aspects of the paper is that it includes many examples that help the reader understand the overall concept and capabilities of the system (e.g. in section 2 and section 5).

However, the paper lacks a formal description of the approach in section 3. The presentation is quite descriptive/narrative, mixing long paragraphs of examples with definitions and explanations that hamper comprehensibility. It is more like a story than formally defining, describing, explaining and exemplifying. I would suggest that the authors revise section 3 and include a formal and well-defined definition of the several concepts proposed in the paper, reducing the size of the paragraphs (for example, the paragraph starting with S1: in page 7 is more than one column in length), and also include algorithms, functions, etc. that will help readers better understand the way that all the proposed concepts, elements, properties, etc. are connected. The examples should complement the formal descriptions, and should not be used as the only means to present the overall concept.

Moreover, the contribution of the paper is not clear. It is mentioned that this paper presents extended evaluation results, compared to the ones presented in [15]. This is OK, but the paper needs to present something more than plain results. Earlier, it is mentioned that this paper presents a detailed presentation of the controlled NL parsing that led the success of a previous (demo) paper in QALD6. Later in the same section, it is mentioned that even without the autocompletion feature, Canali achieves high precision, referencing again QALD6. So, it is not clear what aspects of the presented paper were also part of the demo in QALD6 and what parts are new.

Regarding the experimental evaluation, I have some questions that are mainly relevant to the auto completion feature.

It seems that the experimental evaluation has two parts: the first part is described at the beginning of section 5 and the second part in section 5.1. The difference between these two sections is not clear to me. It seems that the first part uses the auto completion feature, while the second part evaluates only the CNL subsystem. Is this true?

In any case, I guess that the auto completion feature needs someone to type the queries at runtime, right? Therefore, the system gets some help from the user for disambiguating sentences, since the user selects the suggestion that best matches her/his needs. Although this is perfectly fine, how can the system be compared to other systems that get as input questions in textual form, without further “assistance” from the user, and try to automatically disambiguate?

In section 5.1, several examples are presented on how some questions have been reformed in order to be handled by Canali. In one of these questions (what is the official web site of Tom Cruise), the reformulation is quite weird. For example, the revised question is “what is the web site of Tom Cruise”, which is quite similar to the original one. This raises the question to what extend the system employs semantic QA capabilities, apart from retrieving answers from KBs beyond simple keyword-based matches. For example, the terms “official web site” and “web site” are quite similar and any semantic-based QA system should be able to make this connection. Canali seems to heavily based on keywords, rather on the semantics of the input. I guess vocabularies or dictionaries, like WordNet, BebelNet, etc. are not taken into account, right? What if the question of the user contains, for example, a property that is not part of the KB vocabulary, but instead an equivalent property is present, for example, “web site” instead of “web page”. Overall, is not clear the semantic capabilities of the system in terms of ontologies, reasoning, etc. at the NL level, e.g. at the level of processing/parsing the user input and semantically understanding what the user means, beyond simple lucene matching.

Finally, unless I have missed it, the paper does not include any result regarding the response time which involves:
1. Response time regarding the generation/presentation of suggestions, as the user is typing
2. Response time regarding SPARQL execution, which also encapsulates the structure/quality of the generated SPARQL queries.

Regarding 1, it would be nice to include some times in order to have a picture of the time the user needs to wait till some suggestions are presented. For the latter, the paper does not discuss the way SPARQL queries are generated. Is there an one-to-one, straightforward mapping of the automata to SPARQL structures? Is there any optimization taking place? In principle, there are more than one ways to write a SPARQL query and I was wondering what the approach of Canali is.

Review #2
Anonymous submitted on 18/Sep/2017
Major Revision
Review Comment:

This manuscript is a revision of manuscript #1492-2704, which i reviewed as Reviewer 3. At a first reading, the authors appear to have addressed some of the comments; reading closely though, the revisions aren’t in all cases as detailed as anticipated. Specifically, although the authors have now included explicit references to the two previous publications that pertain to the presented QA approach, it is still unclear what the contribution of the current manuscript is, as the expressiveness and coverage of the supported queries remain the same (while the same limitations pertain and are all still mentioned as future work), and the only differences lie in the description of the approach and the presented evaluation results.

In comparison, the current manuscript provides extended evaluation results, which isn’t negligible, yet it’s not adequate on its own, particularly given the questions that still remain unaddressed about the benchmark datasets selected for the comparative evaluation. Moreover, as also noted in the first review round, given that the proposed QA system involves online user-system interaction and a controlled NL for querying, the evaluation should also consider user-centric aspects (usability, time-to-complete a query, intuitiveness, etc.).

Given that the presented results are indeed promising, it’s worth further revising the experimental section to strengthen and solicit the presented work. More specifically:
- Section 5 mentions that “We assessed the performances of CANaLI on both the questions over the general DBpedia KB used in the challenges held since 2013...” yet, the presented evaluation includes results on DBpedia only for QALD-4 and QALD-6. Why is that?
- Why are only 20 questions of QALD-4 considered?
- Section 5.1 discusses in detail the results observed for the QALD-4 experiments; what about QALD-6? And how often/intuitive is in the end, for a user that is not aware of DBpedia ontology to formulate the correct query for those cases that this isn’t straightforward? Also as aforementioned, and also noted in the first review round, to answer the question “about the extent in which restrictions imposed by the CNL makes it less user-friendly...”, a user-centric evaluation is also needed to complement the observations of the purely technical evaluation/discussion.
- What is the intuitive difference between querying a general domain KB, such as DBpedia, from specialised ones, such as the music- and biomedical-specific ones considered in the experiments?
- When discussing the differences with GFMed, it’s worth outlining that GFMed’s tailoring involves the manual definition of alternative template-based verbalisations that allow alternative wordings for each posed query.
- The statement “While CANaLI does not suffer from expressive power and generality issues because of its reliance on CNL” could be revised to be more accurate in terms of observing this within the considered QA datasets (which don’t capture how often structures/situations not currently handled actually appear).
- Not being able to handle equivalent properties (e.g. dbo:almaMater & dbo:education) is an actual weakness of the proposed system’s current implementation, not a “punishment” of the QALD evaluation rules.

In the conclusions section, it is mentioned that that adjectives are prone to interpretation ambiguities (e.g. whether large should be interpreted in terms of population or area), and hence not part of the short plans of CANaLI extensions. Given that what the auto-completion afford is all possible interpretations so that the user can unambiguously chose the intended one, i assume that the main challenge isn’t the interpretation ambiguity with respect to candidate DBpedia properties, but the identification of which properties are candidates, right?

A careful proof reading is also recommended:
- Section 1: “This alos provided..” -> “This also provided..”
- Section 1: “... SWiPE, ClioPedia and SWiPE” -> “...SWiPE, ClioPedia and Xser” ?
- Section 3: “... the types of literals are...” -> “...the types of supported literals are...”
- Section 3: “...being O the stack of...” -> “...O being the stack o...”
- Section 3: “...then the have that entity...” ->
- Section 4: “... in a ]KB” -> “... in a KB”
- Section 4: “inasmuch” -> “in as much”
- Section 5: “To answer this question. we will” -> “To answer this question, we will”
- Section 5.1: “...with other NL systems...” -> “...with other NL QA systems....”
- Section 6: “...relationships into translated” -> “... relationships are translated into”
- Section 7: “aggressive usage of synonyms” -> extended/sophisticated usage of synonyms?
- Section 7: “...simplicity both at the ... ” -> “...simplicity at both the ...”

Review #3
By Gerard Casamayor submitted on 09/Oct/2017
Review Comment:

This paper addresses an important problem for the SW community, question answering over RDF data. However, using a CNL interface to guide the user in formulating only valid queries isn't a new approach.

Since the authors claim to present a CNL interface, I miss a discussion of how this paper contributes to existing research in interfaces for question answering based on CNLs such as ACE, Rabbit, SOS and many others (see Luca et al. 2009 for a review of such languages and their applications to the SW). Despite the authors claiming otherwise in page 2, autocompletion dialogs are far from being novel in NL and CNL interfaces, see for instance Damljanovic et al. 2010.

It seems that the presented interface relies on labels on RDF data in order to map query fragments to entities in the dataset. However I miss a discussion on what to do in the face of scarcity of labels or labels with poor readability. In that respect it is worth mentioning that there are some works which have looked at the issue of labels in LD and SW data, e.g. Ell et al. 2011. Mentioning lexical models for the SW (e.g. Lemon/LexInfo) would've also contributed to justify the viability of the system.

All in all, it is far from clear to me what exactly is the novelty of the proposed system.

Other comments:
- There are some spelling errors across the paper, e.g "alos" --> "also" in page 1, "that led that sucess" also in page 1, etc.
- In the first paragraph of Page 2, the following statement is confusing: "After a short description of SWiPE, ClioPedia and SWiPE,...".
- Of all the different approaches to QA, cons are given only for NL interfaces. Why did the authors think that CNL is a better choice than graph exploration, faceted search and query by example?
- The fact that CANaLI only supports questions matching a few patterns should've been made clear at the very beginning of section 2.
- At the beginning of section 3 the authors give implementation details, i.e. mention to Apache Lucene. This is probably best left for the next section detailing the design and implementation of CANaLI.
- Restricting the state of the art only to systems participating in the QALD challenge seems far too limited.

References used in this review:

Juri Luca De Coi, Norbert E. Fuchs, Kaarel Kaljurand, Tobias Kuhn:
Controlled English for Reasoning on the Semantic Web. REWERSE 2009: 276-308

Danica Damljanovic, Milan Agatonovic, Hamish Cunningham:
Natural Language Interfaces to Ontologies: Combining Syntactic Analysis and Ontology-Based Lookup through the User Interaction. ESWC (1) 2010: 106-120

Labels in the web of data
B Ell, D Vrandečić, E Simperl - The Semantic Web–ISWC 2011, 2011 - Springer