|Review Comment: |
Verdict: Accepted — Subject to Major Revisions
The paper presents a system that transforms natural language questions into SPARQL queries.
The system can be used in a question answering context in order to allow users who are not familiar with SPARQL to ask their questions. In the same time, it could potentially reduce the effort required to write SPARQL queries by supporting natural language instead of structured queries as user interface.
The work is novel in the sense that it is applied to a new system used in a new domain (sports news). The use of ontologies to help transfer the system to other domains is sensible, but not particularly novel. In general, my biggest concern is that the work is not compared thoroughly with other works in this space.
The authors provide many examples to describe the most complex components of the proposed architecture. This is appreciated.
The related work is comprehensive, but could be improved. It focuses on SPARQL-based question answering for obvious reasons. However, at the technical level, the paper addresses the problem of semantic parsing in order to transform natural language questions into knowledge-base queries. There has been much work on semantic parsers, and recently on learnable semantic parsers (Kwiatkowski et al. 2013), that leverage a structured knowledge base (i.e., Freebase) to answer user questions. Research in this space seems relevant to the system proposed in the paper and should be discussed in the next iteration of the paper.
Another area of improvement is the evaluation. The criteria according to which the authors build their question corpus needs to be discussed in greater detail. It is not clear how the survey that led to the proposed distribution of the questions across the different types was conducted. The precision of the system would probably be different for a different distribution of questions and, as a result, the choices that resulted in the selection of those 41 sentences should be discussed in more detail.
The idea of evaluating the performance of the system against a weighted version of precision is interesting. However, the inclusion of the precision and recall scores would emphasise the proposed metric’s reliability. Furthermore, even if the majority of the competition focuses on the general domain, it might be worthwhile to investigate the performance of the proposed system against other competitive approaches that transform natural language questions into SPARQL queries.
The paper is reasonably well written, but it would have helped if the authors would have proof read it more thoroughly before the submission.
There are some grammatical and typographical mistakes. Some representative examples are enumerated below:
1. The second affiliation is misspelled as "Hanoi University or Science and Technology".
2. In the second to last paragraph in the "Related Work" section, the word "domain" is written as "do main".
3. In the third section, the second sentence of the second paragraph should be re-phrased.
4. The indentation across the different types of questions in "4. Question classification" is inconsistent.
5. The first sentence of "6.1 Question preprocessing" should be eliminated.
6. The third paragraph of "184.108.40.206 Identification of ordinary variables and ..." is repeated three times.
7. In both Figure 8 and the description in 220.127.116.11 about the identification of the quantify constraint, it is not clear how the system is able to treat comparison scenarios that manifest themselves with "at least" or "at most" as a subset of those with "than".
8. The indentation in 18.104.22.168 is not consistent across the different types of clauses.
9. The first sentence of "6.5.2 Recognition of classes" should be re-phrased.
10. In the "Experiments" section, it is mentioned that 45 sentences were used for the evaluation of the system, however, only 41 are presented in the corresponding table (Table 2).
There are also some inconsistencies with some of the cross-references in the paper. For instance, in the first example of "22.214.171.124 Representation of temporal constraint in SPARQL query", the reference to Section 3 should have been a reference to Section 126.96.36.199. Sections 6.5.3 and 6.6 have similar issues. The summary of the structure of the paper in the last paragraph of the introduction is also inconsistent with the actual content of the subsequent sections. The cross-reference to "Section 5" in the last paragraph of 7.1 should also probably be to "Section 4".
Furthermore, there are some citations that are missing. For instance, in "Related Works", there are three mentions of the Mooney dataset - to be self-contained, the paper needs to tell the readers how they could get familiar with this dataset. Furthermore, in "6.5.1 Named entity recognition" the Semantic Annotation Platform - KIM is not properly cited either.
In summary, for the paper to be accepted, I would recommend, besides fixing the issues documented in this review, two major areas of improvement:
- the related work (semantic parsing)
- the evaluation: including a more detailed description of the corpus and its design and thorough comparison of the performance of the system compared with direct competitors.