Is Question Answering fit for the Semantic Web?: a Survey.

Paper Title: 
Is Question Answering fit for the Semantic Web?: a Survey.
Authors: 
Vanessa Lopez, Victoria Uren, Marta Sabou, Enrico Motta
Abstract: 
With the recent rapid growth of the Semantic Web (SW), the processes of searching and querying content that is both massive in scale and heterogeneous have become increasingly challenging. User-friendly interfaces, which can support end users in querying and exploring this novel and diverse, structured information space, are needed to make the vision of the SW a reality. We present a survey on ontology-based Question Answering (QA), which has emerged in recent years to exploit the opportunities offered by structured semantic information on the Web. First, we provide a comprehensive perspective by analyzing the general background and history of the QA research field, from influential works from the artificial intelligence and database communities developed in the 70s and later decades, through open domain QA stimulated by the QA track in TREC since 1999, to the latest commercial semantic QA solutions, before tacking the current state of the art in open userfriendly interfaces for the SW. Second, we examine the potential of this technology to go beyond the current state of the art to support end-users in reusing and querying the SW content. We conclude our review with an outlook for this novel research area, focusing in particular on the R&D directions that need to be pursued to realize the goal of efficient and competent retrieval and integration of answers from large scale, heterogeneous, and continuously evolving semantic sources.
Full PDF Version: 
Submission type: 
Survey Article
Responsible editor: 
Philipp Cimiano
Decision/Status: 
Accept
Reviews: 

This is a revised manuscript following a "reject with encouragement to resubmit a revision," and a subsequent "accept with minor revisions". It has now been accepted for publication. The reviews below are for the original submission.

Solicited review by anonymous reviewer:

This paper reviews the state-of-the-art of question answering on the Semantic Web. It offers a clear, comprehensive overview of existing (especially ontology-based) question answering systems and a good discussion of the main challenges that still have to be addressed in the future.

The goal is a robust open domain question answering system that can serve as interface to the Semantic Web. The paper focuses on the main challenge arising from this goal: coping with the vast amount of distributed, highly heterogeneous semantic data. However, I think that there are at least two additional issues that will become
important for providing naive users with easy access to the Semantic Web, although the amount of existing research is not overwhelming yet:

* Answer presentation
Hiding the complexity of the Semantic Web behind a user-friendly interface not only involves to enable the user to input natural language questions (instead of formal queries) but also to present retrieved semantic data in a natural and intuitively understandable way. That is, instead of returning URIs or RDF triples, semantic data should be processed, e.g. verbalised and possibly enriched in order to also convey meta-information such as trust and provenance.

* Multilinguality
Although processing of semantic data is language-independent, interaction with these data by means of question answering is language-based, thus requires localisation in order to enable multilingual access to the Semantic Web.

When talking about bridging the gap between the Semantic Web and the conventional Web, on the other hand, I think the paper reaches beyond its scope. At this point, I consider it important to draw a line between `question answering over the Semantic Web' and `semantic question answering'. The former requires search over semantic data and is the claimed topic of the paper, while the latter also includes semantic search over non-semantic data, involving issues that go beyond the claimed scope of the paper, like enriching non-semantic sources with semantic annotations. As a consequence, a minor comment on Section 5.4: Since it is mainly a system description of PowerAqua, I think it would fit better with the other system descriptions in Section 4.1 (which contains also its predecessor AquaLog), while the integration of semantic and non-semantic data could be mentioned as an important challenge for future work.

Finally, it would be nice if the paper concluded with an upshot with respect to the title question. This would complete a very valuable overview of question answering on the Semantic Web.

MINOR COMMENTS:

* The term `habitability' is first mentioned at the end of Section 2. It should be briefly explained at that point, since it occurs a few times (e.g. in 4.1, where it is finally explained).
* Footnote 11 gives the link to the Stanford parser. Since the Stanford parser is mention already earlier in the text, the link should be given with the first mention. The same comment applies to Sindice in 5.3.
* At the beginning of FREyA's description in 4.1, there seems to be some mix-up: "FREyA is the successor of QuestIO, QuestIO approach uses very shallow NLP...".

Solicited review by anonymous reviewer:

The submitted article is a survey paper about question answering (QA) especially about QA in the domain of the Semantic Web (SW). The focus of the survey is on ontology-based QA, i.e., QA on structured data represented in ontology-based repositories. The title of the article suggests that the survey will also discuss the current state-of-the-art with respect to certain research and development demands intrinsic to the SW namely retrieving and fusing answers from multiple, heterogenous, and automatically discovered semantic sources.
Since the focus is on QA-based interfaces the survey also provides a overview of earlier QA research and systems, namely traditional natural language (NL)-based QA interfaces to databases (NLIDB) and NL-based QA interfaces to unstructured full text repositories.

The article begins with an introduction of the key goals of the survey: scalability and heterogenous of SW resources, and friendly user interfaces for efficient exploration of mass of semantic data. Ontology-based QA is introduced as a new paradigm for mastering both, scalability and user-friendliness. And the goals of the survey is to actually analyze the pro and cons of research proposals in this area.

In order to do so, in section 2, the relevant goals and dimensions of question answering are introduced, which are 1) the user input, 2) the answer sources, 3) the scope, and 4) intrinsic search problems, e.g., adaptability and ambiguity.
The authors also restrict their survey on QA systems focusing in factual question, i.e., Wh-question like "Who, What, When" or command-like questions (e.g., "List all open-domain ontology-based QA systems which report an f-measure with higher than 75% !"). The survey explicitly does not cover QA for more complex questions, like opinion-based questions, causality and reasoning-based questions (e.g., "Why …" or "How …") or open-domain definition questions, i.e., "What …" questions about arbitrary concepts.

In section 3, a short survey of related work on "non-semantic" QA is presented, covering traditional NLIDB systems and open-domain textual QA system as reported in TREC, Web-based QA systems, and QA systems using precompiled fact-bases. Although, the coverage of these kinds of QA is ok, I found especially the focus on TREC a bit odd, because there exists other QA evaluation forums, like CLEF on cross-lingual QA and the recent TAC initiative on QA in the domain of knowledge population which are even not mentioned in the paper!
The usage of the term "semantic" in this section is also a bit irritating. Firstly, it is not explicitly defined what a "non-semantic" system is compared to a "semantic" system. Does it mean "non-ontology" system or "non-SW" system or "strictly syntactic" system. If the latter case is meant, then how do those QA systems relate to QA systems that are based on formal semantic approaches or systems that explore semantic grammars, and which are actually discussed in this section ? I think that a clear definition or description of the term "semantics" would have helped to clarify in order to understand the pro and cons of such "non-ontology-based QA" systems compared to the "ontology-based QA systems" much clearer. Especially, if we consider the review of the open-domain text-based QA systems, the missing of a clear description of the notion "semantics" in this context is getting problematic, because already earlier textual QA systems (as those described in this section) make usage of sophisticated (non-domain) ontology and reasoning capabilities. Clearly, basically on the NL side in order to compute and deduce important lexical and sentential semantics, and by taking into account information extraction as "fact providers". But, as we will see later, similar technology at least for the user input analysis is also used in ontology-based QA systems (parsing, wordnet, …), so that it is not completely clear, what the innovative aspects are in this respect for ontology-based QA systems.

Section 4 is devoted to the current state of the art in the area of ontology-based QA. The core functionality of a ontology-based QA system is described as QA systems as "QA systems that take NL queries and an ontology as input, and return answers drawn from some KBs that subscribe to the ontology." The survey of such systems begins in section 4.1. with those that are tailored to a specific domain and the amount of customization they need is discussed. The NL part of the user query analysis in these system seems to be similar to the Wh-components used in text-based QA (e.g., making use of syntactic parsing, word net etc., seamy comments above): basically computing some sort of triple representation that can be potentially mapped and aligned with the triple representation of the ontology in use as a starting point for extracting facts from the KB as potential answers. In contrast: a text-based QA computes (among other things) an optimal query to the fulltext search engine in form of the best keywords and their combination that can be computed from the NL user query. It is clear that in both cases the same problems exist, like query expansion, paraphrase recognition or recognition of lexico-semantic relationship. At least to me, I do not see the innovations that ontology-based QA systems have brought in. For that reason, a much clearer description of the major differences concerning the main components of ontology-based QA compared to "non-semantics" QA would have been helpful in addition of listing such systems and their individual solutions. In doing so, the survey could also help to avoid "inventing the wheel (s)" again.

In section 4.2. the performance of current ontology-based QA is described. I acknowledge the description and discussion of these experiments. However, something that I missed is a clear discussion of the relevance of performing such tests for ontology-based QA. For example, the "TREC people" have investigated enormous efforts to define and establish evaluation experiments, and the same holds for NLIDB. But these experiments are so to speak tuned for the QA systems at hand. The question remains, whether there exists something similar also for ontology-based QA and if not, what would be required to do so for an qualitative and quantitative evaluation setting and in order to be able to compare different methodologies and implementations. Later, in section 5.3. (on latest work on large-scale semantic search), some efforts into this direction are discussed, e.g., the Billion Triple Challenge, but for me, this is more under the perspective of "semantic search" and not so much under the perspective of "ontology-based QA". (In some sense this would be similar to the difference between evaluation of information retrieval systems and evaluation of textual QA systems - which of course, cannot be compared.)
In this context, I found the argumentation to restrict the survey to "standard" factual questions weak and not convincing (see sec 2. page 3, third paragraph). It seams that this restrictions has basically been made, because most implemented ontology-based QA systems also seen to restrict themselves to factoid questions. However, first of all there is some interesting initial work in QA that go behind simple factual questions, and second, at least to me, it seems that with the help of the reasoning capabilities of SW system, the SW at least as the potential to provide a platform for answering such complex questions. So, a more detailed discussion, how the SW can support QA research into the direction of answering more complex questions would have been worthwhile for stimulating future directions and for setting up interesting evaluation scenarios.

In the subsection 4.3. and 4.4 the achievements of current ontology-based QA compared to the previous proposed QA systems are discussed. I found the discussion in subsection 4.3. a bit weak and partially too programmatic. Especially, the argumentation in 4.3.2 is not convincing. Firstly, it is argued that text-based open-domain QA systems need more sophisticated processing for question and answering processing. However, later in section 6 (about achievements and research gaps), it is argued that "Automatic disambiguation can only be performed if the user query is expressive enough to grasp the conceptualization and content meaning involved in the query …" (section 6.3., second paragraph), and that on the fly mappings are needed in order to obtain scalable domain-adaptivity. Further in section 6.4 it is discussed that simply keyword input (as used in semantic search) is not expressive enough, but that more complex NL user queries should be supported. But then, what are the real the differences wrt. methods and components in text-based QA ?

When recent ontology-based QA systems are described, then it is getting clearer, that the major difference between "text-based" QA and "ontology-based" QA seems to be structure of the answer source, here simply an "unstructured" text and there an "structured" ontology subscribed data-base. However, since text-based QA systems are required to extract exact answers (i.e., normalized text fragments that represent the answer, and not only text fragments that contain the answer) such systems already have to take into account "semantics" in form of entities and relations. Of course, they probably need more additional and complex processing units to "uncover" such hidden semantics (e.g., through the usage of deeper information extraction), which is also discussed in the survey. But I found the critical statements about text-based QA not very much convincing, especially if it is assumed that the "semantic answer sources" in a ontology-based QA system have to automatically be computed from text and web pages (like Wikipedia in the case of DBPedia). In this case a huge amount of processing has to be provided at least offline in order to extract and index such KBs. So the difference might be then basically a difference between online and offline usage of "semantics".
Basically, my confusion also arose when reading section 4.4. about current limitations of QA approaches on large SW. It is said that "current ontology-aware systems suffer from the current main limitation when applied to web environment: they are restricted to a limited set of domains". And then a number of proposal are made for scaling up to large SW. it seems that a number of shallow technology is discussed, which already has been investigated in earlier "non-semantic" QA, e.g., in textual QA.

So I am not very much convinced with the statement in section 5 (first sentence), that ontology-QA already have shown advantages wrt. traditional QA, especially open-domain text-based QA. In particular, because the major limitation of current ontology-based QA is their restriction to a single ontology, as described by the authors in section 5.
On the other side, semantic search (although not counting as QA) is investigating into "open-ontology" based approaches. Unfortunately, the role of these semantic search approaches are bit unclear in the context of the survey. My assumption is that they are introduced as "basic tools" for determining initial answer candidates in a similar way, as full text search engines are used as basic tools for determining candidate text fragments from which later exact answers are extracted by text-based QA systems. For that reason, it would be helpful to clearly define and describe the use of semantic search engines in the context of ontology-based QA systems. This is getting a bit clearer in section 5.4. when the approach of PowerAqua is described, but at least for readability, it would have been helpful, if this would have been made clearer already when introducing semantic search.

In section 6, achievements and research gaps for ontology-based QA are discussed. It is argued - and I agree - that the next step of ontology-based QA systems should be something like open ontology-based QA systems. Of course, this requires high research efforts for tackling problems like automatic disambiguation, on the-fly-mapping, and handling expressive user queries. In this context it would be helpful to discuss briefly the different roles of NL and KB research. For example, it might be useful to follow an explicit division of labour in the research direction, but to discuss and negotiate the bottlenecks on both sides. Otherwise, it might be that it will not be possible to converge both research directions as needed for obtaining the desired large-scale open ontology-based QA. In the survey, discussing this issue, could also help to motivate and recruit talented young researcher from the NL community to foster research in open semantic QA. In section 7, the research direction of "open semantic QA" is further elaborated. I found this the most clearest part of the survey, although some research aspects are missing; see my comments on NL and KB.

To some up: This is interesting survey that gives a detailed overview on the state-of-the-art in question answering and ontology-based QA in particular. The structure of the survey is largely "bottom-up" in the sense that it follows a a descriptions of QA systems from "old" to "new". By building up step by step, the authors try to describe the advantages of newer approaches wrt. older approaches. However, at least for the comparison between text-based QA and ontology-based QA, a number of arguments are not convincing, as I have outlined above. I think it is necessary, to outline much clear the commonalties and the difference equally and more balanced. Since a survey will probably address newcomers to the research area, one should definitely avoid that the wheel is invented again! One possibility could be to focus on these two kinds of recent QA research, viz. ontology-based QA and open-domain text based QA. By doing so, one could introduce a "generic" QA architecture highlighting the main components, e.g., question analysis, answer extraction and answer selection, and then show what the major difference are in the research directions which have been realized in the systems. In this context it would also be very helpful to define and described more clearly the notion of "semantic" versus "non-semantic". Furthermore, a more detailed discussion on evaluation of large-scale open semantic QA would be helpful. How is it done, what is the standard, what is missing etc. One of the major success on text-based QWA and information extraction (also from the Web) is that their exists very good and work-out evaluation use-cases. Discussing current efforts in this direction or missing research efforts would also be very helpful.

Solicited review by anonymous reviewer:

This article provides an overview of question answering (QA) and other related search techniques in the context of Semantic Web (SW). While the authors provide a broad overview of the existing literature, there are not convincing enough arguments in favour of the QA from ontologies.

When comparing QA from text and QA from ontologies, the complexity moves from the QA system to the ontology. In this clue, it is difficult to accept as an advantage the "relatively easy design" of an ontology-based QA system, since it requires an ontology to be built in advance, which is not less complex than building an intelligent text-based QA system. Because of the same reason, it is
difficult to accept the other "advantages" of the ontology-based QA systems.

Another remark is that the article looks somehow abstract and unclear. The paper would benefit from presenting in a little bit more details promising semantic-based QA approaches and algorithms. One or two ontology-based systems may be presented in more details with appropriate figures to give a better idea about the interaction between QA and SW technologies. It is difficult to understand from the text how a typical ontology-based QA system works.

Some overview of the current state of the SW sources (in terms of size and domains represented)
and existing methodologies for building them will give a better idea about the possibilities of SW-related technologies.
The final section (7) contains some information about the SW state, this passage can be moved in the beginning.

The text could be more focused. Many passages can be written in a more concise manner. Currently,

the article talks about too many things: NLIDB, Open domain QA, QA from ontologies, Wikipedia,

DBpedia, query expansion, etc.
I feel, authors should concentrate more on few approaches and present them better.

The important topic of multilinguality and cross-linguality is not touched in this article.

Multilingual QA at CLEF provides good examples for multilingual and cross-lingual QA systems.

Another important technology, not discussed much in the paper is Machine Learning.
Learning algorithms should play an important role in each open domain
natural language processing system, due to the enormous amount of world knowledge, which
is supposed to be modeled when going out of the close world assumption.
I also expected more about ontology learning and population. After all, the big challenge in front of the ontology-based QA systems is the the building of the
ontology itself.

All in all, I expect the paper to be made more clear with more details about presented semantic approaches.

Some minor remarks:

The QALL ME project (http://qallme.fbk.eu/) could be mentioned as an example of QA from structured data.

The classification of the QA systems and search interfaces is suggested exclusively by the authors
or it is based on some preceding work? (if so, a relevant reference should be provided)

In the second page, authors talk about the development of the ontology-based QA in the "recent years". However, the literature, cited in this paragraph is from 1995-2004.

In the paragraph, which begins with "NE recognition and information extraction (IE) are powerful tools...", the authors could write more about Information Extraction and its importance for building structured knowledge sources.

I'd suggest as an additional reference the Marius Pasca's book "Open-Domain Question Answering from Large Text Collections". Although, it is written in 2003, it provides an overview of some basic QA techniques.