Estimating query rewriting quality over the LOD

Tracking #: 1612-2824

This paper is currently under review
Ana Torre
Jesús Bermúdez
Arantza Illarramendi

Responsible editor: 
Guest Editors IE of Semantic Data 2017

Submission type: 
Full Paper
Nowadays users have difficulty to query datasets with different vocabularies and data structures that are included in the Linked Data environment. For this reason it is interesting to develop systems that can produce on demand rewritings of queries. Moreover, a semantics preserving rewriting cannot often be guaranteed by those systems due to heterogeneity of the vocabularies. It is at this point where the quality estimation of the produced rewriting becomes crucial. Notice that, in a real scenario, there is not a reference query. In this paper we present a novel framework that, given a query written in the vocabulary the user is more familiar with, the system rewrites the query in terms of the vocabulary of a target dataset. Moreover, it also informs about the quality of the rewritten query with two scores: firstly, a similarity factor which is based on the rewriting process itself, and so can be considered of intensional nature; and secondly, a quality estimation that can be considered of extensional nature offered by a predictive model. This model is constructed by a machine learning algorithm that learns from a set of queries and their intended (gold standard) rewritings. The feasibility of the framework has been validated in a real scenario.
Full PDF Version: 
Under Review