Benchmarking Question Answering Systems

Tracking #: 1578-2790

This paper is currently under review
Ricardo Usbeck
Michael Röder
Michael Hoffmann
Felix Conrads
Jonathan Huthmann
Axel-Cyrille Ngonga Ngomo
Christian Demmler
Christina Unger

Responsible editor: 
Ruben Verborgh

Submission type: 
Full Paper
The need for making the Semantic Web better accessible for lay users and the uptake of interactive systems and smart assistants for the Web have spawned a new generation of RDF-based question answering systems. However, the fair evaluation of these systems remains a challenge due to the different type of answers that they provide. Hence, repeating current published experiments or even benchmarking on the same datasets remains a complex and time-consuming task. We present a novel online benchmarking platform for question answering (QA) that relies on the FAIR principles to support the fine-grained evaluation of question answering systems. We present how the platform addresses the fair benchmarking platform of question answering systems through the rewriting of URIs and URLs. In addition, we implement different evaluation metrics, measures, datasets and pre-implemented systems as well as possibilities to work with novel formats for interactive and non-interactive benchmarking of question answering systems. Our analysis of current frameworks show that most of the current frameworks are tailored towards particular datasets and challenges but do not provide generic models. In addition, while most framework perform well in the annotation of entities and properties, the generation of SPARQL queries from annotated text remains a challenge.
Full PDF Version: 
Under Review