Review Comment:
This paper sets out a benchmark for evaluating the performance (in terms of space requirements and query retrieval times) of RDF archives. As discussed by previous reviewers, the majority of the first few pages have been previously published in a paper of the same name published at SEMANTICS 2016. This includes review of six types of retrieval queries (version materialisation, single-version structured queries, cross-version structured queries, delta materialisation, single-delta structure queries, and cross-delta structured queries), discussion of approaches to RDF archiving (independent copies (IC), changed-based (CB), timestamp-based (TB), and hybrid-based (HB) approaches), formalisation of the features that characterise data and five types of queries proposed to for evaluating RDF archives. Definitions of how these queries can be instantiated using AnQL are provided for three RDF archive approaches (IC, CB, TB). Three datasets along with their own sets of queries that are used in the evaluation are described (BEAR-A from previous work, and the new BEAR-B and BEAR-C datasets).
The authors present an extensive evaluation, which compares their own implementations in Jena and HDT of IC, CB, TB, and three hybrid approaches along with three systems developed by others (v-RDFSA, R43ples, and TailR). The first part of the evaluation focuses on comparing and explaining differences in the space requirements of the implementations / archive approaches for BEAR-A, here versions of BEAR-B, and BEAR-C. The remainder of the evaluation focuses comparing and discussing retrieval times for the five queries (with additional results presented in appendix A and B) using BEAR-A, and two version of BEAR-B dataset and queries. Here the discussion usefully interprets the various graphs (which understandably can be difficult to read given the quantity of data points), discussing some of the strengths and weaknesses of the different archiving approaches in the implementations. BEAR-C is not used in the second part of the evaluation as the systems are unable to resolve the queries; rather the queries are provided to support future research in the area.
One minor query relates to how the authors envisage that others, for example, developers of a new RDF archiving system, could reuse the BEAR framework. Having looked at the BEAR webpage and sources, it appears there are some scripts under development to run the queries. It’s a minor point that shouldn’t prevent the paper being accepted, but some more documentation / guidance on this would be beneficial to the community.
The conclusion section simply summarises the paper and briefly mentions two future work activities. Ideally this would be improved to provide the reader with what the authors feel to be the key points that they have identified from the evaluation that should be used both to guide people considering deploying an RDF archiving system, and also to shape future developments in this area.
In terms of originality, the core material related to the actual benchmark have been published previously; instantiation of the five queries in AnQL is new, as are BEAR-B and BEAR-C datasets; the main original content is the extended evaluation and associated discussions. These should provide sufficient new contributions that are useful to researchers in this and related fields, and are of relevance to the special issue. The paper is generally well written; there are a few typos (see below) and links to some higher resolution versions of the graphs would improve their legibility.
Typos
Pg 7, Left col: “especial” -> “special”
Pg 9, Right col: “end end” -> “end”
Pg 10, left col: “ckecked” -> “checked”
Pg 17, Left col: “scalability problems at large scale RDF” -> “scalability problems. At large scale specific”
Pg 21, right col: “trough” -> “through”
|