Review Comment:
A Fine-Grained Evaluation of SPARQL Endpoint Federation Systems
In this paper the authors present an exhaustive evaluation of some SPARQL endpoint federation systems. The contribution of the paper consists in extending the Fedbench benchmark with more fine grained features to evaluate and more systems to run this evaluation. They provide a new dataset, Sliced Bench which contains the original datasets but sliced and distributed across several Virtuoso servers. In the evaluation section the authors run the queries in FedX, SPLENDID, DARQ, LHD and ANAPSID.
1 Introduction Section:
In the Introduction Section the authors introduce the increasing popularity of SPARQL federated systems and thus the need of evaluating them to find out which of these systems is the most appropriate for querying distributed SPARQL endpoints. They also describe that current evaluations to the existing system just focus on the execution time and leave out other important features like number of sources selected, number of ASK queries sent, or source selection time. The authors also describe the problem of data partitioning and at the end of this section they describe the contributions of the paper.
Comments:
First, the authors almost always assume that the only way for federating SPARQL queries is by means of systems that only implement an extension to SPARQL 1.0, leaving out/ignoring the SPARQL 1.1 Federation Extension. The only sentence I think is in the paper for stating the difference between the author’s approach and the SPARQL 1.1 federation approach is: "Federated queries [7,20,27,23], in a virtual integrated fashion is becoming increasingly popular”. I think the paper should explicitly say that there are two ways for federating queries: using SPARQL 1.0 and a virtual integration of datasets and the one in the recommendation and that in the current paper the authors focus in the latter.
2 Related Work section:
In this section (2.1) the authors list the some of the works about SPARQL query federation engines (using SPARQL 1.0) and focus on their evaluation. This short list of engines + performance studies include link traversal systems. This section seems quite disconnected since it is just a list of works without analysing/describing in detail each work listed. Again, a reference to the SPARQL 1.1 Federation Extension should be included.
Section 2.2 present the existing the benchmarks is much more elaborated than Section 2.1. In this section the authors present these evaluation works and a small analysis of them. However I think that [1] is missing.
3. Federated Engines Public Survey Section
In this section the authors present a survey in which they ask for implementation details like the source selection technique, join implementation used, etc. The authors also ask in this survey for the desired features that a federation engine may have like partial results retrieval, privacy, adaptive operators, etc.
Comments: Regarding privacy, I’m not sure why this should be a task for the federation engine. There are approaches like [2] that require an ID in the query URL for accessing the data. I think the data access should be managed by the RDF database, not by the federation engine since that engine is just a client.
Regarding the systems analysed in the survey, I still think that the authors should say clearly that SPARQL 1.1 federation engines are not considered in this survey.
Regarding result completeness I would like the authors to clarify what is 100% recall in this evaluation. In Table 2 the authors show that some systems have complete result sets while others do not. However I do not know why some are complete, maybe they return more results than others? I would like to see an explanation of the diverging query result sets since the technique using for doing joins (or other operations) may affect the final result sets. Besides, this 100% recall is done for each query in the evaluation?
I find the other categories interesting and well described.
3.2 Discussion of the survey results Section
In this section the authors divide the "existing SPARQL query federation approaches” in three categories (SPARQL endpoint Federation, query federation over linked data, query federation on top of Distributed Hash Tables), and these categories are also divided into three more subcategories: catalogue/index assisted solutions, catalogue/index-free and hybrid solutions. This classification seems fine to me, the only problem I see is that Atlas has its own category because I do not think it is a query federation engine: in the end it is an RDF storage system which is distributed across a P2P network. So I think it is normal that has its own category since it is not a SPARQL federation engine like the others as it is shown in [3] and in [4]. I think it should be removed from this work, but I may understand its presence here.
4 Details of Selected Systems Section
After analysing and describing 14 systems the authors selected 6 engines to be benchmarked. In this section the authors provide a bit more detailed description of the selected systems and also provide a description of what the authors consider SPARQL query federation. The authors also describe in a bit more of detail the implementation details of the selected engines.
I think this section is good and easy to read.Present clearly the systems that will be evaluated. Maybe the authors should highlight that ANAPSID is developed in Python which has some drawback from using Java (compiled code in the end).
5. Evaluation Section
First the authors describe the experimental setup, which uses the Fedbench query sets for cross domain, live sciences, linked data and the SP2B datasets. All the datasets involving these queries are sliced in 10 pieces and distributed across 10 different servers so in each server there are bits of data from each of the datasets. The authors detail how the slicing was done and I find it a quite interesting technique.
The authors also select some queries from the LD and SP2B domains discarding others "to cover majority of the SPARQL query clauses and types along with variable results size”: I do not think removing queries from a benchmark is a good way to cover the majority of the SPARQL query clauses. Personally, if I want to cover more cases I just add more queries, not the contrary. Besides the authors of the benchmark included these queries in their evaluation for some reason. For instance, SP2B query 6 to 8 are very hard query to answer since they are selecting twice the same elements from the dataset (using ?article1 and ?author1). I would like to see all queries from the SP2B and LD query sets executed in this evaluation. The goal is to produce a fine grained and complete evaluation, and thus no queries should be removed.
5.2 Evaluation Criteria Section
In this section the authors describe the criteria used in the evaluation of the systems. I think they are a good criteria but I’m missing the result completeness criteria, like result completeness that the authors introduced in Section 3. It would be interesting to add a column for each query showing the amount of results returned by each engine.
5.3 Experimental Results Section
The next sections describe the results obtained from executing the queries in each of the datasets. The authors first describe the results from the pattern wise source selection, pointing out that ANAPSID is the best one. This section also presents the effects of the overestimation of the selected sources by the engines.
Next, the authors present the results for the amount of ASK queries used by the SPARQL federation engines. Table 9 summarises the results showing that ANAPSID is the system with less amount of ASK queries, which is inline with the results from the previous section. The source selection time is analysed in the next section and the authors point out that the results for ANAPSID include not only the source selection time but also the time needed for decomposing a query due to ANAPSID implementation details. That show ANAPSID as the slowest system which kind of contradicts with the two other results before. I would try to present more fair results here, like calculating for all engines query selection + query decomposition times or removing ANAPSID from the chart, since it shows this system as the slowest one when it is not.
Finally the authors present the query execution time results. These results show FedX as the fastest engine overall specially with warm cache. The authors also analyse the effect of the sources overestimation specially regarding FedX and SPLENDID.
6. Discussion section
The authors present a discussion mainly about why FedX and SPLENDID are the fastest engines. However I do not see any explanation of why ANAPSID is slower than the two other engines specially when ANAPSID is the fastest engine at the time of selecting the sources to query and the engine that submit less ASK queries.
Finally the authors also show the effects in partitioning the data which are very interesting.
Final comments
I think this is a very interesting work but it has to clarify some point. The first one is the fact that the authors totally ignore the fact that there is an W3C Recommendation for SPARQL Federated Query and that most of the RDF storage systems implement it, including Virtuoso. The authors should clearly state that there are two ways for federating queries and that they focus in one of them. Next, there are some sections that should be rewritten (like section 2.1). Regarding the discussion I think that it should be explained why ANAPSID is one of the slowest when it is t he best one when selecting sources. At the beginning of this paper the authors say:
"In our experiments, we made use of both FedBench [25] and SP2Bench [26] queries to ensure that we cover the majority of the SPARQL query types and clauses."
This is not true since you only use some of the queries from SP2Bench which are included in Fedbench. I would suggest to the authors to use all queries from these datasets or remove that sentence.
Some typos:
In the survey discussion section:
repeated sentence: "However, two out of the eight with public implementation do not make use of the SPARQL endpoints and were thus not considered further in this study.”
Typo: cashing results? maybe caching results?
References
[1] Gabriela Montoya, Maria-Esther Vidal, Oscar Corcho, Edna Ruckhaus, and Carlos Buil-Aranda. 2012. Benchmarking federated SPARQL query engines: are existing testbeds enough?. In Proceedings of the 11th international conference on The Semantic Web - Volume Part II (ISWC'12), Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, and Jérôme Euzenat (Eds.), Vol. Part II. Springer-Verlag, Berlin, Heidelberg, 313-324.
[2] Manuel Salvadores, Matthew Horridge, Paul R. Alexander, Ray W. Fergerson, Mark A. Musen, and Natalya F. Noy. 2012. Using SPARQL to query bioportal ontologies and metadata. In Proceedings of the 11th international conference on The Semantic Web - Volume Part II (ISWC'12), Philippe Cudré-Mauroux, Jeff Heflin, Evren Sirin, Tania Tudorache, and Jérôme Euzenat (Eds.), Vol. Part II. Springer-Verlag, Berlin, Heidelberg, 180-195.
[3] Zoi Kaoudi, Iris Miliaraki, Matoula Magiridou, Antonios Papadakis-Pesaresi and Manolis Koubarakis. Storing and Querying RDF Data in Atlas. In 3rd European Semantic Web Conference (ESWC2006), 11 - 14 June 2006, Budva, Montenegro, Demo paper.
[4] Filali, I., Bongiovanni, F., Huet, F., & Baude, F. (2011). A survey of structured p2p systems for rdf data storage and retrieval. In Transactions on large-scale data-and knowledge-centered systems III (pp. 20-55). Springer Berlin Heidelberg.
|