Review Comment:
This is a dataset description paper which proposes a question answering dataset for bench-marking.
Different editions of the dataset known as Question answering over linked data (QALD) have already been published. QALD-10 is proposed in this paper which is based on Wikidata.
Overall, it is a valid and interesting dataset description paper and this reviewer has some following suggestions:
General comment: Is it possible to provide an analyses of what type of questions are challenging to answer? This might help researchers to focus on unsolved problems. For the moment, most of the discussion is focused on the challenges faced while doing adaptation to Wikidata. Are there some questions in general that are difficult for most of the question answering systems? This reviewer feels that such discussion is only partially present in the paper.
General comment: The readme at https://github.com/KGQA/QALD-10 seems slightly short. Some more description can be provided. For example a few lines about https://skynet.coypu.org/wikidata/ could be provided.
p3, l33: "low complexity of the gold standard SPARQL queries": here the discussion was about several challenges, but the formulation above makes it sound as if it is less challenging. Is it possible to find another reformulation of "low complexity"?
Moreover, the challenges are quickly stated using 1 short liners. The challenges should be explained more at this point in the paper.
p6, l18: please see if all acronyms are defined when they appear first time in the paper. For example QQT here is not defined.
p6, l38: generate --> generates?
p7, l35: "However, the results clearly suggest that the proposed benchmark is way more complex than QALD-9-plus in terms of various important modifiers such as COUNT, FILTER, ASK, GROUP BY, OFFSET, and YEAR." It is actually not so clear just by looking at the table, because it does better in some metrics, but bad in others. Perhaps reformulate to say that a detailed analysis of the complexity is done in the following text.
p7, l43 and l48: it is called joint vertex here, but in l10 it is called join vertex.
p10, l22: in many examples only entity IDs are provided for example wd:Q28222602. Perhaps the paper will be more readable if the entity names are provided as well.
p11, l31: "We formulate our challenges and solutions during the SPARQL generation process to aid further research in KGQA dataset creation as well as Wikidata schema research."
This is said in the end. Perhaps it should have been said in the beginning of Section 5.
|