Evaluating Query and Storage Strategies for RDF Archives

Tracking #: 1608-2820

Authors: 
Javier D. Fernandez
Juergen Umbrich
Axel Polleres
Magnus Knuth

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Full Paper
Abstract: 
There is an emerging demand on efficiently archiving and (temporal) querying different versions of evolving semantic Web data. As novel archiving systems are starting to address this challenge, foundations/standards for benchmarking RDF archives are needed to evaluate its storage space efficiency and the performance of different retrieval operations. To this end, we provide theoretical foundations on the design of data and queries to evaluate emerging RDF archiving systems. Then, we instantiate these foundations along a concrete set of queries on the basis of a real-world evolving dataset. Finally, we perform an empirical evaluation of various current archiving techniques and querying strategies on this data that is meant to serve as a baseline of future developments on querying archives of evolving RDF data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Nick Bassiliades submitted on 12/Jun/2017
Suggestion:
Minor Revision
Review Comment:

This paper studies the problem of querying and storing RDF archives. Specifically, it reviews different storage strategies and sets the theoretical framework on how to evaluate these strategies. Furthermore, it proposes queries to evaluate these strategies and actually performs evaluation on three specific benchmarking datasets, which have been created by the authors, on two RDF triple stores. The paper attacks the problem in a very methodologically correct and complete way. The paper could be accepted as it is module some minor remarks. In my opinion this paper will become seminal in the years to come.
I have very few comments:
- Definition 3. Are the insertion and deletion ratios correctly defined? The change ratio between two versions considers in the denominator the cardinality of the union of both versions, whereas the insertion/deletion ratios consider only the cardinality of the first version. In contrast, in definition 12 the insertion and deletion dynamicity considers (in the denominator) both versions. So, is there a rationale behind this differentiation or is it just a typo?
- There are some typos in the definition of the query cases in AnQL (page 7, right column bottom): Ver(Q): SELECT * WHERE { P :?V }.
The correct should be: SELECT * WHERE { Q :?V }.
(page 8, left column, top)
Change(Q):
SELECT ?V1 ?V2 WHERE { {{P :?V1} MINUS {P :?V2}} FILTER( abs(?V1-?V2) = 1 ) }
The correct should be:
SELECT ?V1 ?V2 WHERE { {{Q :?V1} MINUS {Q :?V2}} FILTER( abs(?V1-?V2) = 1 ) }
- Section 3.3.2: Why don’t you give evidence for the naïve implementation of Change(Q) as well? Furthermore, why don’t you give a concrete example on how min/max functions (for query Var(Q)) could be implemented on an off-the-self SPARQL implementation?
- Section 5.2: Why don’t you present also results for join and change queries? At least for some of the datasets.

Review #2
By Xin Wang submitted on 08/Aug/2017
Suggestion:
Minor Revision
Review Comment:

Before reviewing this paper I thoroughly read a previous version of this paper published in SEMANTiCS 2016 [1] (as mentioned in the cover letter). Both papers are well written and describe 1) metrics to characterise RDF archives (versioning) and 2) 5 categories of queries to benchmark the performance of data retrieval from RDF archives (the first 3 types of queries present in the evaluation). These metrics are reasonable and are novel when they are published for the first time. However, it's not convincing that the current version provides enough extra insight to be included in the special issue, despite it would be a good fit.

The current version shares the same essential ideas with the previous one and provides more details of those ideas, and more importantly, more comprehensive evaluations with 2 more datasets and more complex queries. In other words, most new contents fall in the evaluations. These extra evaluations are probably useful for researchers in related areas but don't seem to provide more insight about the benchmark described in the paper.

* Evaluation of metrics of dataset configuration (metrics in Section 3.1)

Instead of giving these metrics for one dataset (bear-a), two more tables are given for DBpedia Live (bear-b) and Open Data portals (bear-c) respectively. Again I think these tables are very informative, however, they don't further justify the proposed metrics. These tables would make important contributions in a paper surveying popular RDF datasets from an archiving point of view, but less so for the purpose of evaluating a benchmark. The metrics are reasonable (to me) and applying them to a dataset demonstrate their utility. However adding two more datasets adds little to the contribution of the paper.

* Evaluation of atomic types of queries (metrics in Section 3.2)

Five types of queries are proposed to cover a broad spectrum of data retrieval from RDF archives. In both versions three types are evaluated. In this version two more datasets are used and more complex queries (i.e. queries with more than a single triple pattern). The same argument stated above applies here too. It'd be more interesting to have queries from all five categories rather than to repeat similar evaluations with different queries from the same categories. The evaluation gives samples falling in a relatively short range of the spectrum while the more interesting and more demanding types of queries are not discussed. In addition the evaluation compares the performance of two engines, Jena and HDT, and HDT outperforms Jena in most of the tests. The paper shows that HDT is faster than Jena on the independent copy policy, which means HDT is faster in general disregard of the RDF archiving scenario. Later the paper concludes that HDT implements the delta copy policy more efficiently than Jena, however, it's not clear to me whether this conclusion is made after ruling out the general advantage of HDT over Jena. If not, then the only conclusion we can draw from the evaluation is that HDT is faster than Jena, which is straightforward and irrelevant to the main contribution of this paper.

In summary, I'd like to see a more comprehensive evaluation that covers a broader spectrum of RDF archive related queries. This probably requires a major revision. Meanwhile this paper is a good fit to the special issue and could be useful for researchers in related fields, therefore I put the decision to Minor Revision to increase the chance of its acceptance.

[1] Fernández, J. D., Umbrich, J., Polleres, A., & Knuth, M. (2016, September). Evaluating Query and Storage Strategies for RDF Archives. In Proceedings of the 12th International Conference on Semantic Systems (pp. 41-48). ACM.

Review #3
Anonymous submitted on 17/Nov/2017
Suggestion:
Major Revision
Review Comment:

The paper presents an evaluation of query and storage strategies for RDF archives using the BEAR benchmark that has been developed in earlier works by the authors.
The authors present briefly the different archiving strategies that have been developed in the literature (independent copies, change-based approach, timestamp-based approach and hybrid-based approach) as well as retrieval queries (version materialization, single-version structured queries, cross-version structured queries, delta materialization and single and cross-delta structured queries).
The authors also formalize the different notions needed in this context such as the notions of RDF Archive and Version, Version change ration and Version data growth. They also proceed with the definition of static core, version-oblivious triples, RDF vocabulary per version and RDF vocabulary per delta. Authors also formalize the notion of RDF vocabulary set dynamicity. These notions are used to define the cardinalities (archive and version-driven result cardinality) used for the design of the benchmark queries. These concepts are essential when deciding the set of queries to add in the benchmark. The idea is to have queries that are balanced - they are not too hard or too easy for a system to address. In this sense, it is essential to have specific criteria as a guideline that can be used to define the different queries.
Authors also provide a formalization of the different query types that were analyzed in the document in addition to an instantiation of query templates in the AnQL query language.
Finally, authors present the BEAR Test Suite for RDF Archiving. BEAR has been defined in earlier works of the authors and is not a new contribution of the authors in this paper. The Benchmark comes with 3 different datasets and of course 3 different sets of queries per dataset. The authors have implemented the archiving strategies in the Jena TBD store and benchmark those implementations using BEAR. The authors describe the benchmark dataset(s) and queries using the notions that they introduced in the paper.
The authors then present an extensive study of the different archiving policies they implemented on top of the Jena TBD with the different versions of the BEAR benchmark.

Regarding (1) originality, the paper includes content that has been published before. The difference with the previous published works is that the authors present two new datasets BEAR-B and BEAR-C in addition to the previously published BEAR-A dataset. As such, the paper is not highly original, but adds new content that is very interesting.
Regarding (2) impact, the datasets considered by the BEAR benchmark are highly interlinked and used by a broad range of users and systems.
Regarding (3) quality of writing, the paper is well written, although there are several language problems that can be easily corrected by a native English language speaker. The paper is also very well structured and the different approaches reg. queries and archiving policies are, although briefly, well presented.

I believe that it would be interesting for the audience if authors benchmark existing archiving systems (there are a few from Academia) with their benchmark instead of benchmarking simply their implementation of the different strategies with their benchmark. So, I would suggest that authors test their benchmark with systems such as R4triples. Memento,R&Wbase and TailR.