A RADAR for Information Reconciliation in Question Answering Systems over Linked Data

Tracking #: 940-2151

Authors: 
Elena Cabrio
Serena Villata
Alessio Palmero Aprosio

Responsible editor: 
Guest Editors Question Answering Linked Data

Submission type: 
Full Paper
Abstract: 
In the latest years, more and more structured data is published on the Web and the need to support typical Web users to access this body of information has become of crucial importance. Question Answering systems over Linked Data try to address this need by allowing users to query Linked Data using natural language. These systems may query at the same time different heterogenous interlinked data sets, that may provide different results for the same query. The obtained results can be related by a wide range of heterogenous relations, e.g., one can be the specification of the other, an acronym of the other, etc. In other cases, such results can contain an inconsistent set of information about the same topic. A well known example of such heterogenous interlinked data sets are language-specific DBpedia chapters, where the same information may be reported in different languages. Given the growing importance of multilingualism in the Semantic Web community, and in Question Answering over Linked Data in particular, we choose to apply information reconciliation to this scenario. In this paper, we address the issue of reconciling information obtained by querying the SPARQL endpoints of language-specific DBpedia chapters. Starting from a categorization of the possible relations among the resulting instances, we provide a framework to: (i) classify such relations, (ii) reconcile information using argumentation theory, (iii) rank the alternative results depending on the confidence of the source in case of inconsistencies, and (iv) explain the reasons underlying the proposed ranking. We release the resource obtained applying our framework to a set of language-specific DBpedia chapters, and we integrate such framework in the Question Answering system QAKiS, that exploits such chapters as RDF data sets to be queried using a natural language interface.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Mariano Rico submitted on 26/May/2015
Suggestion:
Reject
Review Comment:

As this manuscript has been submitted as 'full paper', it should be reviewed along the usual dimensions for research contributions:

Originality
------------
This is the weakest point from my perspective. Although authors explicitly say that this work is an extended version of a workshop paper (NLP-DBpedia) and they cite several previous papers of their own, I have found several papers related to this work but not cited in the paper. Perhaps the most relevant is "Reconciling information in DBpedia through a QA system" (ISWC 2014). I will refer to this paper as [x1].
I agree it is a 4 pages work, but its content is (IMHO) the key data of this paper.
More specifically, in section 5 (related work) authors claim that this paper has 4 enhancements compared to a previous paper of their own (cited as [10] in its reference list):

1) Relations categorization. I cannot see any difference with the categorization made in [11] section 2 (notice it is not [10] but [11]).

2) Relations extractions. For me this is a tabular representation of data in [x1]. This paper includes some more data (it is confusing, but it seems that in this paper they include data from QALD-4, but this is 12 in 70).

3) Evaluation. They claim a higher precision, but I only see a minimal increment in f1 (QAKiS+RADAR overall positive), moving from f1=0.77 in [x1] to f1 = 0.78 in this paper.

4) Resource. Authors claim that results are available online, but I cannot see it in http://qakis.org/resources.htm. In this site, the Reconciled DBpedia resource link (http://wikifiki.apnetwork.ch/radar/radar-0.1.nt.bz2) is not working (error 404, resource not found).

Significance of the results
-----------------------------------------
With this low level of originality, the significance of the results is very low.

Quality of writing
------------------------------
The paper is well written and comprehensive.

More aspects
--------------------
Figure 1 is almost identical to figure 1 in [x1].
Figure 2 is similar to fig 1 in [10]
Table 1 is identical to table 1 in [10]
Non cited but related papers from this authors:
- "Hunting for inconsistencies in multilingual dbpedias with QAKiS". ISWC 2013
- "Quering multilingual dbpedia with QAKiS". ESWC 2013.
- "Boostig QAKiS with multimedia answer visualization". ESWC 2014.

Review #2
Anonymous submitted on 29/Jul/2015
Suggestion:
Major Revision
Review Comment:

The paper presents a novel framework for reconciliation of the results of a query from multiple sources, with primary focus on multi-lingual sources such as Wikipedia/DBpedia. The same query is run across sources with different languages and the results are reconciled and ranked according to several criteria measuring confidence in each source and the amount of agreement among the sources in case the results contain duplicate, semantically related, or inconsistent information. The addressed problem is an interesting and challenging problem, with an important application in query answering systems. The proposed solution is novel and promising, using interesting heuristics for confidence assessment, methods of classification of relations across query result sets, and a reconciliation mechanism based on argumentation theory.

The paper has the potential to be a strong contribution to this special issue and the literature in this area. However, presentation issues along with very limited experiments make it difficult to understand the significance of the proposed solution in practice and comparing with simpler heuristics. In particular:
1) Lack of strong real-world examples in almost all the paper and particularly the first two sections. This is important both to motivate the problem and the proposed solution, and clarify technical details.
2) Experiments on a small number of queries with a very limited scope, with only very minor improvements over the baseline approach: 2 more queries answered out of 31 questions, resulting in 6% improvement in recall.

In terms of the three usual dimensions for research contributions: (1) originality, (2) significance of the results, and (3) quality of writing - the paper needs to do a better job in (2) and (3) to clearly show that the results are significant, and make the contributions and their significance more clear.

Detailed comments are below. Note that if the final decision is a revision and you choose to revise, you do not need to address all the detailed comments listed below, but only the ones relating to the above two problems.

1) Given that this is a journal article with no page limit, the paper can be made self sufficient and complete in terms of technical details. There are various places where you refer the readers to previous work and in particular your previous workshop paper on the topic. Typically, you do not even need to cite the workshop paper if this is a significant extension and can only explain the extensions in your cover letter. Referring to your own previous work for critical details as done for example in footnote 10 is only acceptable if there is a strict page limit. You can also include more details from other cited work to make it easier to understand and appreciate the proposed solution.

2) On page 2, second to last paragraph, the argument is not clear. Can you argue that your solution is better than a cleaning or alignment in a preprocessing phase? That is, what if we reconcile all the sources first and then run queries and perform query answering?

3) Is your work relevant to other non-language based problems? That is, how would the problem change if you query multiple English data endpoints that are on the same topic? It would be nice to have examples and justify the relevance to this broader problem, or point out differences and what needs to be done in future work.

4) In Section 2.2 where you discuss surface variants, don’t we have unique URIs in proper Linked Data sources to address these problems?

5) Footnote 3: Does this mean that if WikiData replaces DBpedia, then your system completely loses its application because the information across all languages will be consistent?

6) Page 5 where you discuss inclusion category, you list both Wikipedia infoboxes and DBpedia. Isn’t DBpedia extracted from Wikipedia infoboxes?

7) It would be much nicer if Figure 2 and the related examples are replaced with real-world examples from multi-lingual DBpedia/Wikipedia information.

8) Footnote 10: “mode” -> “more”

9) I don’t understand footnote 11.

10) Footnote 12: DBpedia has static dumps so one entity disappearing is not an issue as long as you mention the version used. Are you using DBpedia 2014, or 3.9? There is a new release for 2015.

11) Footnote 12 and across the paper: You need to replace the URLs with permanent URLS (e.g., see http://purl.org). A journal article will never change, but your domains and URLs are very likely to change or disappear.

12) You can too many footnotes. You can just move many of them inside the text I believe.

13) Page 9, first paragraph: This is the first place you have a nice real example. You need more examples like, but in previous sections. Here I expect to see statistics, for example how many facts like this have you been able to reconcile across the languages. How better is this reconciled source comparing with, for example English DBpedia alone?

14) Section 4, paragraph before 4.1: These examples don’t prove the need for argumentation. The first one a simple majority voting would work. In the second one, it is just a matter of grouping entities that are the same.

15) Section 4.2, the question “Who developed Skype” doesn’t return anything on the current demo. There are other issues as well on the demo, some examples returning no results, and “Technical Details” tab showing the same query for all the languages. This is another place where I could not find any evidence that argumentation is really needed and that the proposed solution is a significant contribution to this space.

16) Results in Table 5: The results basically show that QAKiS + RADAR answers only two more questions, therefore increasing recall from 0.58 to 0.64. What are those two? and how RADAR helps? Could a simpler method do equally well? Is the argumentation module any useful? Any examples or evidence showing the bipolar argumentation is useful, versus non-bipolar?

Review #3
Anonymous submitted on 02/Aug/2015
Suggestion:
Minor Revision
Review Comment:

Orginality: The paper is an extended version of a workshop paper published at the Workshop NL&DBpedia-2013. The extensions with respect to the first version are detailed by the authors in the Related Work section. Compared to the workshop paper the authors argument that they use a more fine-grained approach for relations categorization. Furthermore, they claim relations are automatically extracted with the application of more robust techniques leading to improved evaluation results. Please state in more detail to what extend the approach for relations categorization is more fine-grained and to what extend the techniques for relations extractions are more robust.
Please also discuss in more detail the differences and improvements in contrast to the other works addressing alignment agreement based on argumentation theory. It would be beneficial to include at least one of them in the evaluation to compare against and demonstrate the superiority of your framework.

Significance: The authors introduce and evaluate a framework for information reconciliation over language-specific DBpedia chapters. The framework allows to classify relations among the resulting instances, reconcile information using argumentation theory, rank the alternative results depending on the confidence of the source in case of inconsistencies and explain the reasons underlying the proposed ranking.

Quality of writing: The paper is well presented with a clear structure.