A logical characterisation of SPARQL federation

Tracking #: 760-1970

Audun Stolpe

Responsible editor: 
Oscar Corcho

Submission type: 
Full Paper
The paper provides a logical characterisation of distributed processing of SPARQL queries. The principal notion analysed is that of a distribution scheme; a pair consisting of an evaluation rule and a distribution function. Different choices of evaluation rules and distribution functions give different federation schemes that are proved to be sound and complete wrt. different selections of RDF datasets. Three distribution functions are singled out for special attention, called the even-, standard- and prudent distribution respectively. All yield sound and complete federation schemes when combined with an evaluation rule named the collect-and-combine rule. The completeness results thus obtained are next compared to existing federation systems with the aim of illustrating the potential utility of a framework in which salient properties can be formulated and explored. The final section, relates distribution schemes to the heuristic notion of a join-ordering to yield sound and complete execution plans for the aforementioned federation schemes.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Juergen Umbrich submitted on 19/Sep/2014
Review Comment:

Thanks for addressing the comments. I'm happy with the latest revision.

Review #2
Anonymous submitted on 27/Sep/2014
Minor Revision
Review Comment:

The author addressed the majority of my comments as well as my questions. The paper is definitely much clear now, and the examples precisely illustrate the power of the formalism. Overall, the reported work is very important to understand the behavior of existing engines, and is a valuable contribution for federated data management. Nevertheless, there are some definitions and properties that still need to be either explained in more detail or illustrated with more examples. First, just after Theorem 4.1 and Definition 4.6, the meaning of join-connected of BGPs and cross-site joins needs to be explained. Regarding to Theorem 5.5, please provide a decomposition of query LS5 that is agnostic and egalitarian. Give examples of unconstrained BGPs.

There are some errors in Tables 7 and 8 in pages 14 and 15, respectively. As indicated by Montoya et al. in 28 and 23, DEFENDER/ANAPSID assigns maximal stars that can be potentially executed by at least one endpoint. Thus, query LS5 is decomposed into three stars. The first one is composed of triple patterns in rows 1, 2, and 3, and it is assigned to Drugbank. The second star comprises triple pattern in row 4, and it is assigned to ChEBI and KEGG. Finally, triple patterns in rows 5 and 6 are assigned to ChEBI. This decomposition can be found at http://defender.ldc.usb.ve (Star-Shaped Groups with Union) and also presented in the paper by Montoya et al. COLD2012 Table 3.

Minor comments:
Present references in increasing order, e.g., [18,17] => [17,18].
Make explicit the answer to my question Q3 in the conclusions of the paper.

Review #3
By Carlos Buil Aranda submitted on 02/Oct/2014
Minor Revision
Review Comment:

All my concerns have been answered. Only a minor detail, I thank the author for citing one of my works regarding the SPARQL 1.1 Federation, however I would also like a reference (or a footnote) to W3C SPARQL 1.1 Federated Query Extension: http://www.w3.org/TR/sparql11-federated-query/ when the author refers to that document.