Review Comment:
The Web of Data is growing fast and one of the main challenges is the development of techniques to broaden the use of available data. This paper addresses exactly this issue and proposes a innovative technique for expanding results of SPARQL queries based on rewriting and running queries on additional datasets. Expanded results are scored accordingly to the degree of confidence in which the results are relevant to the initial query. The scoring is a linear combination of similarity measures. The coefficients are calibrated in a supervised machine learning approach using the Harmony Search (HS) genetic algorithm.
The text is very well written and the work has been greatly improved since last review by clarifying details about the rewriting rules and the calibration of the confidence score, but there still is some minor open issues for further clarification.
Comment #1
==========
The authors say:
Excerpt #1: "We can highlight three main approaches among those proposals: 1) those that generate a kind of centralized repository that contains all the data of different datasets and then queries are formulated over that repository (e.g. [22]); 2) those that follow the federated query processing approach (e.g. [12]) in which a query against a federation of datasets is split into sub-queries that can be answered in the individual nodes where datasets are stored; and 3) those that follow the exploratory query processing approach (e.g. [13]), which take advantage of the dereferenceable IRIs principle promoted by Linked Data."
Excerpt #2: "Each centralized repository or a datasets federation only considers a limited collection of datasets and, therefore, cannot help a user with datasets that are out of the collection. Moreover, it seems interesting for the users to pose queries to a preferred dataset whose schema and vocabulary is sufficiently known by them and then a system could help those users enriching the answers to the query with data taken from a domain sharing dataset although with different vocabulary and schema. Our approach considers that type of systems. Notice that a proper rewriting of the query must be
eventually managed by those systems."
In excerpt #2 the authors seem to classify the system presented in the paper as being of a category different from those three pointed out in excerpt #1 and taken as the only ones recognized by the literature. This divergence must be emphatically and explicitly pointed out and formalized as one of the contributions of the paper.
Comment #2
==========
The authors say:
Excerpt #3: "In the scenario considered in this paper we think that the content and the result set are the appropriate dimensions because structure, or language of the target query are irrelevant to the user that formulates the query over the source dataset."
I would say that it is not always true. The structure and language may interfere in the query results. Rather, dimensions 1 (structure) and 3 (language) may be left behind as an assumption of the proposing technique.
Comment #3
==========
Section 3 tackles the problem of query rewriting and a possible implementation through SPARQL queries. This is a tricky issue, since it unveils a greater complexity, that may require more detailed discussion. For exemple:
1) The query SELECT ?t:p WHERE {bind(s:p) owl:sameAs ?t:p.} does not comply to the SPARQL 1.1 specification.
2) Would it be better using a SELECT or a CONSTRUCT query?
3) Would queries run on a dataset KB U {converted triple patterns} or KB U {entities of triple patterns}?, such that KB is a knowledge base as DBpedia, Freebase, etc. and "converted triple patterns" would be a set of RDF triples created from triple patterns.
The implementation of the rewriting module could be discussed in a separate work.
Comment #4
==========
"Those queries along with their corresponding gold standards and the names of source and target datasets are listed in the appendix hosted in the footer URL"
It would be better using a citable reference using DOI rather than an under development material. See Figshare, Zenodo, etc.
|