A RADAR for Information Reconciliation in Question Answering Systems over Linked Data

Tracking #: 1255-2467

Elena Cabrio
Serena Villata
Alessio Palmero Aprosio

Responsible editor: 
Guest Editors Question Answering Linked Data

Submission type: 
Full Paper
In the latest years, more and more structured data is published on the Web and the need to support typical Web users to access this body of information has become of crucial importance. Question Answering systems over Linked Data try to address this need by allowing users to query Linked Data using natural language. These systems may query at the same time different heterogenous interlinked data sets, that may provide different results for the same query. The obtained results can be related by a wide range of heterogenous relations, e.g., one can be the specification of the other, an acronym of the other, etc. In other cases, such results can contain an inconsistent set of information about the same topic. A well known example of such heterogenous interlinked data sets are language-specific DBpedia chapters, where the same information may be reported in different languages. Given the growing importance of multilingualism in the Semantic Web community, and in Question Answering over Linked Data in particular, we choose to apply information reconciliation to this scenario. In this paper, we address the issue of reconciling information obtained by querying the SPARQL endpoints of language-specific DBpedia chapters. Starting from a categorization of the possible relations among the resulting instances, we provide a framework to: (i) classify such relations, (ii) reconcile information using argumentation theory, (iii) rank the alternative results depending on the confidence of the source in case of inconsistencies, and (iv) explain the reasons underlying the proposed ranking. We release the resource obtained applying our framework to a set of language-specific DBpedia chapters, and we integrate such framework in the Question Answering system QAKiS, that exploits such chapters as RDF data sets to be queried using a natural language interface.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 31/Dec/2015
Minor Revision
Review Comment:

The authors have made a major revision, addressing most of my concerns very well. The only issue not addressed properly is the experimental results and the response to Reviewer #2 Question 16. I have asked very specific questions but the answer is quite vague and high-level. I am not sure if a shortcoming of the results is being hidden, or the question is not understood well. I repeat my question below:

"The results basically show that QAKiS + RADAR answers only two more questions, therefore increasing recall from 0.58 to 0.64. What are those two? and how RADAR helps? Could a simpler method do equally well? Is the argumentation module any useful? Any examples or evidence showing the bipolar argumentation is useful, versus non­bipolar?"

In the revised manuscript, the recall increase comparing with baseline is 0.52 to 0.63, which given that the number of questions is 43, it means 27 (0.63 * 43) questions with the right answer comparing with 22 (0.52*43). Is that right? If this is not the case, then your definition of recall is wrong. If this is the case, then what you can easily point to these 5 questions and mention why RADAR 2.0 is helping. The same can be true for precision. The only thing that needs to be made very clear is that your solution is useful in practice. This is what your experiments are supposed to show and the current results or the current presentation is not showing that. I hope this would be a simple revision for you to clarify the results.

Also in your answer to Question 15, you claim your demo works whereas it seems like you did not understand my question or your demo does not really work. First, I learned that the question is case-sensitive ("who developed skype?" doesn't work). Second, when I click on "Technical Details" tab for question "who developed Skype?" on http://qakis.org/qakis/index.xhtml what I see under all 3 languages is the same following query:

select distinct *
where {
graph ?g {
?v .
?t owl:sameAs .
?v ?t .
?v rdf:type .
OPTIONAL {?v ?l filter (lang(?l)="en")}
} limit 20

Note filter (lang(?l)="en") in the query. Do you seriously run this filter (lang(?l)="en") query over French and German DBpedia?

Review #2
By Mariano Rico submitted on 21/Jan/2016
Review Comment:

Thanks for the changes. Now it is fine for me.

Review #3
Anonymous submitted on 02/Feb/2016
Review Comment:

The authors have improved the paper and have addressed most of the issues raised by the reviewers.

Review #4
By Christina Unger submitted on 15/Feb/2016
Minor Revision
Review Comment:

The following provides a list of typos, grammar mistakes, and formulation and formatting suggestions.

Page 2

* instances about a single object --> instances of a single object
* such a kind of issues --> such kind of issues
* a unique (possibly correct) answer --> a unique (ideally correct) answer
* to evaluate accordingly the information items they provide --> to evaluate the information items they provide accordingly
* transparent, and as a consequence, --> transparent and, as a consequence,
* linguistic-based fine-grained relations --> linguistic, fine-grained relations, or: linguistically fine-grained relations

Page 3

* applied over 300 --> applied to over 300
* chapters values is released --> chapters is released
* We do not think that such a kind of explanations would be possible with alignment, but we do not claim that our solution is better.
--> Should be "such kinds of explanation", but actually it's absolutely not clear what this sentence and the sentences following it want to say.
* from the same SPARQL query --> from one SPARQL query
* under the form of a graph --> in the form of a graph
* Language-specific DBpedia chapters can contain different information from one language to another, providing more specificity on certain topics, or filling information gaps.
--> Language-specific DBpedia chapters can contain different information on particular topics, e.g. providing more or more specific information.

Page 4

* actor A. Albanese --> actor Antonio Albanese
* The chapter of the longest language-specific Wikipedia page describing the queried entity is rewarded with respect to the others
--> Actually, this formulation is not accurate, as the score of this page is 1 whereas all other pages get a score < 1. This means that the page is not rewarded but the others are penalized. (The same holds for geo-localization.)
* and to the corresponding chapter is assigned a score equal to 1 --> and to the corresponding chapter a score equal to 1 is assigned
* whose appropriateness --> the appropriateness of which
* summed, and normalized --> summed and normalized
* less reliable chapter --> least reliable chapter
* Relations classification --> Relation classification
(This occurs throughout the whole paper. Also, you always say "results set", where I would rather say "result set".)
* such categories correspond to the linguistic phenomena (mainly discourse and lexical semantics) holding among heterogeneous values
--> such categories correspond to the lexical and discourse relations holding among heterogeneous values
(Also, it's not clear what "values" means. The labels of object resources and literals?)
* Footnote 2:
can be found here http://download.geonames.org/export/ dump/countryInfo.txt. --> can be found here: http://download.geonames.org/export/ dump/countryInfo.txt

Page 5

* SameAs --> owl:sameAs
(this occurs a lot, and I would always write "owl:sameAs")
* I would also capitalize all bold relation names.
* Footnote 3:
humans and machines alike http://www.wikidata.org/, --> humans and machines alike, http://www.wikidata.org,
* hyponymy: when the former is included within the latter --> maybe rather "when the latter is implied by the former"?
* the description of metonymy is also not optimal

Page 6

* relation between entities/objects --> What exactly are "entities/objects"? URIs, or labels, values? What exactly do you mean by "values"? In this whole section, please be more precise with these terms.
* data set --> dataset (This occurs throughout the whole paper.)
* no such a training set --> no such training set
* so that to accomplish our purpose --> in order to accomplish our purpose
* we detail RADAR 2.0 argumentation module --> we detail the RADAR 2.0 argumentation module
* with the other arguments --> with other arguments
* an example of AF --> an example of an AF
* Dung’s acceptability admissibility-based semantics --> this doesn't sound grammatical
* confidence associated to --> confidence associated with

Page 7

* Figure 2:
- Example of (a) AF --> Example of (a) an AF
- Please mention in the caption that single lines represent attacks and double lines represent support.
* associated to the sources --> associated with the sources
* accepted at the end --> accepted in the end
* if they overcome a certain threshold --> if they exceed a certain threshold
* Let α be a bipolar fuzzy labeling. We say that α is a bipolar fuzzy labeling if and only if ...
--> This doesn't make any sense. I would suggest moving the last sentence of Definition 1 to Definition 2, saying something like "A total function "α : A -> [0,1] is a bipolar fuzzy labeling if and only if ..."
* Table 1: Instead of A,B,C I would use small letters a,b,c, as they refer to the nodes in Figure 2.
* Also, you actually never mention how alpha is computed step-wise in cyclic graphs, converging to one value. It would be helpful to add a sentence about this.
* the bipolar fuzzy labeling algorithm is raised on the argumentation framework --> the bipolar fuzzy labeling algorithm is applied to the argumentation framework, or: the bipolar fuzzy labeling algorithm is executed on the argumentation framework
* we expect to have the Italian DBpedia chapter as the most reliable one being Stefano Tacconi an Italian soccer player
--> This is not grammatical... Probably you want to say the following:
we expect the Italian DBpedia chapter to be the most reliable one, given that Stefano Tacconi is an Italian soccer player
* the "correct" answer is 1.88 --> either remove the quotation marks around "correct" or say "the trusted answer is 1.88"

Page 8

* as well when --> either "as well as" or "when"
* non bipolar --> non-bipolar argumentation
* up to our knowledge --> to our knowledge
* linguistic phenomena holding among values --> linguistic relations holding between values
* the types of relation --> the type of relations
* specific relation (property) --> specific property
* among the categories distribution --> among the distribution of categories
* with this respect --> in this respect

Page 9

* In Tables 2 and 3 (and also 4 on page 13), I would capitalize all column headers.
* corresponding to the 47.8% of DBpedia instantiated properties --> corresponding to 47.8% of all properties in DBpedia
* triples, from --> triples from
* On the contrary --> In contrast
* non functional --> non-functional (occurs often)
* we reconciled 3.2 million functional properties --> I guess you mean 3.2 million triples?
* with an average accuracy comparable to the one described in Table 3 --> What do you mean by accuracy? Precision?
* the strategy "DBpedia CL" --> Please briefly mention what CL stands for.
--> Also, why choose the most specific class and not simply all classes?
* Footnote 12: This link should be provided in the main text, not in a footnote, I think.

Page 10

* QAKiS addresses the task of QA over structured knowledge-bases (e.g. DBpedia) [10], where the relevant information is expressed also in unstructured forms (e.g. Wikipedia pages). It implements a relation-based match for question interpretation
--> This is a bit confusing and misleading. Please reformulate.
* sent to a set of language-specific DBpedia chapters SPARQL endpoints --> sent to the SPARQL endpoints of the language-specific DBpedia chapters
* require either some forms of reasoning (e.g., counting or ordering) on data, aggregation (from data sets different from DBpedia), involve n-relations
--> require either some form of aggregation (e.g., counting or ordering), information from datasets different than DBpedia, involve n-ary relations
* Footnote 14: Please use http://www.sc.cit-ec.uni-bielefeld.de/qald/ as URL. The http://greententacle... URL is not persisent.

Page 12

* QALD data set was created --> the QALD dataset was created
* are present in this data, i.e., surface forms, geo-specification, and inclusion, and --> are present in this data - surface forms, geo-specification, and inclusion - and
* on the top of a QA system existing architecture --> on top of an existing QA system architecture
* previous work [12,11,8], introducing RADAR 1.0 --> previous work [12,11,8] introducing RADAR 1.0
* judge arguments' acceptability --> to judge an argument's acceptability

Page 13

* Relations categorization --> Relation categorization
* Relations extraction --> Relation extraction
* You have an extra space between every bold term and the following colon, which I would remove. E.g. "Evaluation : " --> "Evaluation: "
* Also, I would begin with a capital letter after the colon, and end each paragraph with a dot instead of a semicolon.
* the contribution on this side --> the contribution the contribution here
* linguistic-based relations --> linguistic relations
* data from QALD-2 have been used --> data from QALD-2 has been used
* f1 --> F1
* State of the art QA systems --> State-of-the-art QA systems
* SW --> either write "Semantic Web" or introduce the abbreviation somewhere.

Page 14

* Sometimes you write "Linked Data" and sometimes "linked data". Please stick to one.
* another possibility is to leave the data consumer itself to assign --> another possibility is to let the data consumer itself assign