A Framework for Web Based Language Independent Semantic Question Answering System

Tracking #: 2114-3327

Authors: 
Vaibhav Agarwal
Parteek Kumar

Responsible editor: 
Thomas Lukasiewicz

Submission type: 
Full Paper
Abstract: 
Question answering systems (QAS) attempt to let you ask your question the way you'd normally ask in natural language. After doing an exhaustive survey on the existing QAS it was observed that there is no system which has all the features viz. online availability, multilinguism support, ability to extend the support to other languages without changing architecture/code base, ability to integrate other ML/NLP applications, availability of source code, and whose working of all com-ponents is known. This paper presents a framework and the live application developed (as a proof of concept) on top of this framework which supports all these features. After testing the proposed system on two standard corpora, the ‘Conciseness’, ‘Relevance’, ‘Correctness’, ‘Precision’, ‘Recall’, and ‘F-Measure’ of the developed system came out to be 89.5%, 86.4%, 100%, 86.4%, 100%, and 92.7% respectively.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 07/Mar/2019
Suggestion:
Reject
Review Comment:

In this paper, authors present a Framework for Web-Based Language Independent Semantic Question Answering System which utilizes a Universal Networking Language (UNL) for providing a language-independent semantic QA framework.
To provide a detailed review, I follow the following methodology:
1) To address paragraph in each section, I will use 1.1, 1.2 i.e. paragraph 1 of section 1, etc. This has been done after going through the complete article.
2) I will try to connect the paragraph with the overall message of the paper. Hence if I
Abstract:
Sentence “Question answering systems (QAS) attempt to let you ask your question the way you'd normally ask in natural language”: quite informal way of writing.
The sentence “no system which has all the features viz. online availability, multilinguism support, ability to extend the support to other languages without changing architecture/code base, ability to integrate other ML/NLP applications…”: false claim (please scroll down to see details)
Introduction:
1.2: In this paragraph, the authors describe briefly about UNL. The writing and structure of the paragraph are really fuzzy and very hard to follow. Nearly every second sentence has grammatical errors. I would advise authors that if a new term (eg. UNL, ML/NLP) is introduced or claims (such as UNL is used in machine translation, text mining, etc) are made, please provide concrete pieces of evidence. This has been observed throughout the paper, and claims are made without proper citations.
Also, it is highly unclear in the introduction to how UNL is helping in semantic parsing based QA. Authors explain UNL in the introduction, provided some illustrations, however, contributions of this paper are not positioned properly (rather not mentioned at all).
For future improvement: While writing the introduction, please try to mention three things: What problem is addressed in this article? Why this problem is so important and where authors identified the research gap in the community. Then briefly explain and position your contribution to the same. The introduction should be self-contained and should set the foundation for the rest of the paper.

Literature Review:
The section starts with blur claim: proposed UNL based system will be a major step- how and why?
For Table 1 and 2, I appreciate the author’s effort to provide a summary of various QA research. However, both tables have fundamental flaws.
The first reference, amrita et al. should be Saha et al. Its source code is available online, and can easily be extended for other language support. Same for EARL. Also, EARL is not a QA system but just a component performing entity linking and relation linking. Many QA systems ranging from few for DBpedia KG, few as visual QA and some for other knowledge sources are listed. In both tables, it is completely unclear what message authors want to provide. As this paper is about the QA framework, it is important to understand the difference between QA framework and System. In semantic web community, QA framework such as openQA (Marx et al. 2014) OKBQA (okbqa.org), Frankenstein (Singh at al. 2018) and many others are the frameworks which are used to build QA systems in a collaborative effort and to provide nearly all the support which authors claim in abstract. For example, QALL-ME is a multi-lingual QA framework. openQA is the first attempt to build a QA system by reusing several other QA systems. Frankenstein is the latest attempt to advance the state of the art by providing an abstraction on implementation details, easy extensibility, reusability, and of course open for other language support. If authors type QA framework for DBpedia on Google scholar, openQA appears on the top. All the other QA frameworks have cited this work and can be easily included in related work. I appreciate the long tables and effort behind this however wrt QA frameworks, both tables are irrelevant and as I mentioned earlier, many false claims are made. Please note that EARL is available online, so is openQA, Frankenstein and OKQBA. Section “Previous Work done” has an informal title and I would advise it to couple it with related work.
Section 4 introduces the corpus used. Again there is no background why this section is suddenly introduced next to related work. Until this section, authors have not mentioned anything about the contribution of the UNL based system, architecture, formalization for the same. It breaks the complete flow of the paper.

Section 5 describes the architecture of the proposed system. The section is written in a completely informal way of writing. Example “Lets say the user select…”
Images are blur and pseudocode is ambiguous and nonstandard. Seems like authors have used word template and pasted images of the formulas of Precision, Recall and F-score which are again non-standard and quite unclear.

In the evaluation section, authors report results of 400 evaluated questions. However, it is not clear which is the information source of the answers. Is it DBpedia, Wikidata, complete Web or something else? How semantic parsing is used as claimed in the introduction? What is the baseline?

Overall Comment: Paper is poorly written, highly unclear in the contributions and really very hard to read sentences. Paper lacks on all three aspects: Originality, quality of writing and significance of the results.

Few pieces of advice to improve paper (writing):
1) Please do not make the wrong claim in the paper such as no other QA framework provide functionalities which this paper provide.
2) Please clearly point the contributions of the article.
3) Clearly describe the architecture (both in Image and text)
4) Evaluation of the research question (s) which the article aim to address- formulate the evaluation section wrt the same.
5) Please follow the guidelines of scientific writing.
6) Spelling and grammar need to be checked before submitting the article.

Marx et al. Towards an Open Question Answering Architecture. In Semantics 2014.
Ferrandez et al. The QALL-ME Framework: a specifiable domain multilingual Question Answering Architecture. In Web Semantics 2011.
Singh et al. Why Reinvent the Wheel: Lets Build Question Answering Systems Together. In WWW 2018.

-
Kuldeep Singh

Review #2
Anonymous submitted on 29/Mar/2019
Suggestion:
Major Revision
Review Comment:

The authors of the paper "A Framework for Web Based Language Independent Semantic Question Answering System", present an approach of a QA system, based on a universal networking language, which supports multilingualism.
One of the statements by the authors is that there are almost none QA systems supporting multilingualism. The proposed system however does support multilingualism and as well outperforms other existing QA-systems on different measures.
However, as I understood the paper the proposed system also currently works with only one language pair set and does not really support multilingualism.

The first pages of the paper are covered with two huge tables. The first table contains a list of QA papers and it is stated if they support multilingualism or if they could be adapted to multilingualism. I would like to understand, how this set of QA systems were selected. Looking at other literature surveys such as:
-Lopez, Vanessa et. al. (2011): Is question answering fit for the Semantic Web? A survey.
-Höffner, Konrad et. al. (2016): Survey on Challenges of Question Answering in the Semantic Web

it is obvious that some multilingualism QA-systems are missing. I would also like to understand how the authors differentiate between"Supports Multi Language" and "Can be extended to support other languages"

The second table "QA System Key Metrics" presents an overview of the different QA systems, the dataset used for evaluation and possible results. I think this table is not really helpful if it stays like it is now.
The authors should spend time to clean up the evaluation results and especially the units. Sometimes it is "80%", sometime sit is "0.65"
For some results in the value column it is mentioned if this value represents precision, or recall, but for many this is not give.
Why not reducing the table to some certain measure, all papers have in common and use that to compare your system. Yes, it will be less systems in the second table but I think it is more meaningful than just creating a very noisy table.

The quality of the images should be equivalent. Currently it is a mixture out of bad screenshots, diagrams in good and bad quality, "3D" visualization etc. I also fail to see value of Screenshots from Google Chrome Debugger (Fig4 and Fig5). I am sure the data structure can be presented in better quality and without the help of Google Chrome

Please try to reduce your pseudo code to smaller, understandable snippets.
Is there a public available demo for your system? And as you explicitly had in the first table a column called "Is Source Code Available". So my question: Is your source code available?
If you have a look at the two literature reviews from above, you will find multipole systems with public source code.

Overall I think this paper needs major revision

Review #3
Anonymous submitted on 06/May/2019
Suggestion:
Reject
Review Comment:

# A Framework for Web Based Language Independent Semantic Question Answering System

The authors present a framework for answering questions over Universal Networking Language (UNL) corpora by transforming the questions into UNL form, traversing the UNL trees in the corpora at hand and then greedily picking the first best answers. They also claim to provide an answer ranking method, an extensive literature survey as well as a set of reproducible items.

## Decision

Reject: Overall, the paper reads more like a shallow system description or demo paper. The survey is not deep enough or clear enough to give new insights into open research challenges and contains factual errors. The system itself is only weekly described; the main modules for semantic language understanding are not described in a self-contained way. The article itself lacks links to the are of Semantic Web or Linked Data as it does not use data, tools or resources from this domain. The evaluation is not replicable and the ground truth data not available so that future research is blocked.

## Originality
The paper describes an approach for transforming a question into

Section 1: AskNow is not reflected which uses NQS as intermediate language format, see http://jens-lehmann.org/files/2016/eswc_asknow.pdf

## Significance of the results

Section 2: The authors state "After doing an exhaustive survey [...]" This is not a survey but rather a list. Please take a look at other SWJ surveys how a proper survey methodology works (e.g., http://www.semantic-web-journal.net/content/systematic-survey-point-set-..., http://www.semantic-web-journal.net/content/quality-assessment-linked-da...). It is even unclear what QA systems, i.e., Reading comprehension over text or QA over KGs, are analyzed and how they are analyzed. Furthermore, most columns have crosses although, e.g., code is available for EARL.
Section 2: The survey is not well done despite its volume. EARL is not a QA system. Many articles are cited via their arxiv links (which carries some doubt in itself) rather than cited with their proper peer-reviewed resource. The survey table is also confusing as it sometimes cited the title and sometimes just the abbreviation of the system.
Section 2: Table 2 is also inconsequent. The listing of evaluation metric numbers has no value when they a) are not compared or analyzed in the survey, b) is done differently on different corpora.
Section 4: It is questionable that EOLSS is the largest online publication. What are the metrics to measure that?
Section 5: Describes an example of the UNLIZATION of a text. However, it is scientifically unclear what happens here. The authors have to give a small introduction to the technique.
Section 6: This section is entirely unnecessary as it does not add any scientific value to a full paper.
Section 6: Pseudocode itself cannot explain something. For example, PSEUDOCODE 3 looks very hardcoded to the corpus at hand. For instance, why does it use "hypernodes" in case 2? Also, it was unclear up to this point that the system is only able to answer questions containing exactly one relation.
Section 6: PSEUDOCODE 4 ranking: This is not a ranking methodology as you are not changing the order of items in your returned set. The items just get labeled.
Section 7: It is also unclear how the 400 questions were formed or chosen. Is there any bias in the dataset? What are the features of the dataset?
Section 7: The choice of the other metrics besides P, R, F is unintuitive. Please clarify the additional value over P, R, F and if you want to show the usefulness of your ranking, please introduce such a ranking metric like NDCG.
Section 8: It is unclear why polar questions add up to 1 in Table 4. How do the polar questions look like? How does the tree traversal work for them?
Section 8: In the abstract, the authors claim "After testing the proposed system on two standard corpora, the 'Conciseness', 'Relevance', 'Correctness', 'Precision', 'Recall', and 'F-Measure' of the developed system came out to be 89.5%, 86.4%, 100%, 86.4%, 100%, and 92.7% respectively."

## Quality of Writing
The article is easy to read and follow as it is kept quite shallow. The only reading hurdle is the assumption that each reader knows the depths of UNL. The article does not do a good job of introducing and highlighting the scientific benefits and mechanisms of UNL. However, the website http://undl.org of the foundation seems outdated, inactive and broken with the last traceable signs of work being a book published in 2010.

## Reproducibility

The Semantic Web Journal states on its website: "We encourage authors to write their papers and more specifically the evaluation sections in a style and level of detail that enables the replication of their results.". Also the authors state: "After doing an exhaustive survey on the existing QAS it was observed that there is no system which has all the features viz. online availability, multilinguism support, ability to extend the support to other languages without changing architecture/code base, ability to integrate other ML/NLP applications, availability of source code and whose working of all components is known. This paper presents a framework and the live application developed (as a proof of concept) on top of this framework which supports all these features." However, the feature of source code or demo is not given, nor the question corpus. Thus, the experiments are not reproducible, and there is no gain for the community in this paper for fostering open research or repeatable science.

## Minor issues
- Title: A Framework for Web Based Language Independent Semantic Question Answering System => A Framework for Web-Based Language Independent Semantic Question Answering Systems
- Abstract: you'd => you would
- Section 1: Page 1, Column 2, a lot of empirical claims ("is better than other approaches") without any measurement or proof
- Figure 1: why kitchen? it is the bedroom in the example above
- Section 5.1 the first paragraph is repetition and should be deleted.
- Section 5.1 Citation [19] is wrong for the UNL specification.
- Figure 7: This figure is rather unscientific. Please use standard bar plots.