Review Comment:
This is the first revision of the manuscript I am reviewing. Therefore, I am judging only the manuscript itself, not the comments on the revision.
This manuscript presents COBRA, an approach for storing RDF archives in a way that supports several versioned query patterns (introduced in Section 2.1). The authors' previous OSTRICH implemented, which already serves the same purpose, is extended from unidirectional delta chains to bidirectional ones in an attempt to reduce both storage size and ingestion time. This is measured in a number of evaluations using the BEAR benchmark. The hypothesis cannot be confirmed universally, but in a majority of situations.
Section 1 briefly introduces the problem. Section 2 introduces the basics of versioned queries, different storage strategies and discusses related work. Related approaches are classified w.r.t. storage strategy. Section 3 states the problem and phrases the research hypotheses. Section 4 introduces the bidirectional delta chain approach, for the case of subsequently ingesting additional versions during the lifetime of a dataset, and for the case of ingesting all past versions of a dataset at once, the latter of which can be performed out of order. Section 5 presents the results of evaluating OSTRICH and COBRA in different BEAR settings. Section 6 draws conclusions, also providing concise guidance w.r.t. what storage approach to employ in what practical setting. The implementation is open source and accompanied with everything needed to reproduce the experiments. At least it seems so – I did not try. Just the creation of the BEAR input datasets is not exactly reproduceable, as the evaluation scripts assume that the respective data already exists on a server donizetti.labnet.
The following aspects need improvement (cf. the annotated PDF at https://www.dropbox.com/s/bs96w03ab2ikiu2/swj2830.pdf?dl=0 for details):
* Algorithm 2 is introduced as showing the fix-up algorithm. However, that's actually what Algorithm 1 shows.
* Algorithm 2 is said to assume that n is even, but the shown implementation, which uses Math.floor, does not.
* There are multiple references to the OSTRICH article [5]. These references would be easier to use if they pointed to specific individual sections of that article.
* In Section 4.6.2, the text about delta materialization says that "the results from the two queries are sorted". Why are they sorted, and by what? Do you mean that the (unsorted) results of the first query come first, and then the (once more unsorted) results of the second query?
* In Section 4.6.3, Version queries are defined as "results being annotated with the version in which they occur". However, is the version always unique? (That's what this phrasing seems to assume.)
* The setting shown in Subfigure 2.2 does not involve bidirectionality.
* In Section 5.2, why do you restrict your scope to "at most two delta chains"? In other words, does this not mean that you assume that the threshold for a chain that's "too long" is "(number of versions) / 2"? Would your approach not perform even better with more delta chains?
* In Section 5.3.1, what do you mean by "ingesting a raw representation"?
* Regarding Figure 3: Before reading your reminder about the reverse order of ingestion on the next page, it's hard to understand that we have a zero value in the middle. You could facilitate understanding by showing an arrow that indicates the order of ingestion.
* clarity of phrasing (cf. PDF annotations and comments)
* multiple minor linguistic issues (cf. PDF)
|