QA3: a Natural Language Approach to Question Answering over RDF Data Cubes

Tracking #: 1766-2978

Maurizio Atzori
Giuseppe M. Mazzeo
Carlo Zaniolo

Responsible editor: 
Guest Editors ENLI4SW 2016

Submission type: 
Full Paper
In this paper we present QA3, a question answering (QA) system over RDF data cubes. The system first tags chunks of text with elements of the knowledge base, and then leverages the well-defined structure of data cubes to create a SPARQL query from the tags. For each class of questions with the same structure a SPARQL template is defined, to be filled in with SPARQL fragments obtained by the interpretation of the question. The correct template is chosen by using an original set of regex-like patterns, based on both syntactical and semantic features of the tokens extracted from the question. Preliminary results obtained using a limited set of templates are encouraging and suggest a number of improvements. QA3 can currently provide a correct answer to 27 of the 50 questions of the test set of the task 3 of QALD-6 challenge, remarkably improving the state of the art in natural language question answering over data cubes.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Nov/2017
Review Comment:

The authors have addressed all my comments.

Review #2
By John Bateman submitted on 25/Jan/2018
Minor Revision
Review Comment:

The paper presents a well set out and well documented approach to question answering over the particular organisation of data defined by RDF data cubes. The approach is relatively straightforward but scores quite well on some standard evaluation tests: thus the results are worth reporting. Essentially a translation procedure is defined from regular expression patterns to SPARQL templates. I would have liked to see more assessment of the extent to which this can really be scaled. Indications back to the user that something is going wrong (e.g., inappropriate datasets considered) more actively might also be worth giving more attention. But in general, the results shown are sufficiently well documented that the work can be followed up.

There are several language errors; these should be cleaned up for the final version. I list those I found here:

p1. col2. 'and provide' --> 'and provides'
p2. col1. 'machines, more' --> 'machines. More'
p2. col2. 'and lesson' --> 'and lessons'
p6. col1. 'describe in details' --> 'describe in detail'
p6. col2. '6 generalized-token' --> '6 generalized-tokens'
p8. col1. 'and and' --> 'and'
p8. col2. 'which turns' --> 'which turns out'
p9. fig8. keep 'F-1' or 'F1' consistent
p10. col1. '7 ones described' -> 'seven described'
p10. col1. 'performance are therefore dependant' -->
'performance is therefore dependent'
p10. col1. 'that answers free' --> 'that answer free'
p10. col2. 'of self-evaluate' --> 'to self-evaluate'
p10. col2. 'F-1': F1 also used in figure: pick one form!
p10. col2. 'w.r.t.' --> 'with respect to'
p10. col2. 'viceversa' --> 'vice versa'
p13. col2. 'as much as possible numbers' -->
'as many numbers'
p13. col2. 'in the question' --> 'in the question as possible'
p13. col2. 'new pattern to be input' --> 'new patterns to be input'
p13. col2. 'helps user to' --> 'helps users to'
P14. col1. 'user which interacts' --> 'user who interacts'
p14. col1. 'applying aggregate' --> 'applying aggregation'
p14. col1. 'more complex query' --> 'more complex queries'
p14. col2. 'and the vice versa' --> 'and the converse'
p15. col2. 'to fill in all the placeholders in the query template' -->
'all the placeholders in the query template to be filled in'
p15. col2. 'w.r.t.' --> 'with respect to'