Review Comment:
Summary of the paper’s main contributions and impact
The paper presents a formal framework for specifying federated source selection, and query decomposition and processing. Queries correspond to Basic Graph Patterns (BGPs) and a federation of SPARQL endpoints corresponds to a cluster of RDF graphs. Well-known conditions such as an answerable triple pattern by endpoint is represented by imposing restrictions on the signature of the triple pattern and the vocabulary terms used the endpoint RDF dataset. Additionally, the formalization of different query decomposition approaches is defined as distributions where distinct types of sub-queries are assigned to sources. For example, even distribution implemented by AliBaba or standard distribution by FedX. Further, evaluation rules to collect data from the sources that comprised a distribution of a query is defined, as well as some main of this schema of evaluation.
The paper finalizes illustration the proposed formal framework with the systems ADERIS and SemWIQ.
Overall the paper presents an interesting and challenging problem, and the proposed framework has the potential of providing the basis to understand the behavior and properties of existing federated query engines. Nevertheless, because of the lack of readability and illustration strengths of the framework cannot be clearly appreciated. First, variables used in the definition need to be defined in the definition and used consistently along the paper, e.g., T is not defined in Definition 4.5. Further, definition of “sig” is not presented, thus is not clear why sig(t) \in sig(R) is correct or should be sig(t) \subset sig(R). Although the framework is illustrated with existing federated engines, because these are not open source, is not possible to check soundness of the use cases. Additionally, the examples do not correspond to queries of existing benchmark. So, the optimal decomposition of the query is unknown as well as the way as the decomposition produced by existing federated engines would have been specified in the proposed framework.
Strong points of the paper
S1: A formal framework to define the results of tasks of source selection, query decomposition and reordering is defined.
S2: Three different types of query decomposition criteria are presented, as well as evaluation rules that ensure that correct execution of query plans.
S3: Two use cases illustrating the expressiveness power of the approach are presented.
Weak points of the paper
W1: Notation is ambiguously used along the paper, making hard to understand the proposed formalism.
W2: Terminology used in the context of SPARQL federations of endpoints is not reused, for example, signature, cluster, or sub-pattern.
W3: Use cases do not correspond to benchmarks defined by the community. There are many different and challenging queries in FedBench that could be used to explain the proposed framework. State-of-the art engines such as FedX, SPLENDID, or ANAPSID, that have been evaluated in existing benchmarks are not used to illustrate the usage of the proposed framework.
Detailed Comments
D1: Notation used in the paper has to be summarized in a table to enhance readability. Variables have to be unambiguously used across the paper. Terminology has to be unified to SPARQL formal semantics; definitions presented by Buil-Aranda et al. should be reused. Definitions need to be illustrated to improve readability.
D2: The proposed framework has to be illustrated using FedBench, the benchmark commonly used by the community to evaluate existing federated engines. Decomposition techniques implemented by state-of-the-art open source SPARQL federated engines have to be included as use cases.
Additional questions to the authors:
Q1: How decomposition produced by index-based heuristics as the ones implemented by ANAPSID (Acosta et al ISWC 2011 and Montoya et al. COLD 2012) or by HiBISCuS (Saleen et al ESWC 2014) can be formalized with the proposed formalism.
Q2: Can any of the state-of-the-art federated engines ensure the computation of an agnostic distribution if one exists for a query? Can any of the FedBench queries be decomposed by using an agnostic distribution?
Q3: What are the properties of the distributions of queries comprised by predicates on general vocabularies, e.g., rdf:type, owl:sameAs.
Q4: What happen is a query has variables in the predicate of a triple pattern?
C. Buil-Aranda, M. Arenas, and O. Corcho. Semantics and optimization of the SPARQL 1.1 federation extension. In G. Antoniou, M. Grobelnik, E. Simperl, B. Parsia, D. Plexousakis, P. Leenheer, and J. Pan, editors, The Semanic Web: Research and Applications, volume 6644 of Lecture Notes in Computer Science, pages 1–15. Springer Berlin Heidelberg, 2011.
|