Expressive Querying and Scalable Management of Large RDF Archives

Tracking #: 3940-5154

Authors: 
Olivier Pelgrin
Ruben Taelman
Luis Galàrraga
Katja Hose

Responsible editor: 
Aidan Hogan

Submission type: 
Full Paper
Abstract: 
The proliferation of large and ever-growing RDF datasets has sparked a need for robust and performant RDF archiving systems. In order to tackle this challenge, several solutions have been proposed throughout the years, including archiving systems based on independent copies, time-based indexes, and change-based approaches. In recent years, modern solutions combine several of the above mentioned paradigms. In particular, aggregated changesets of time-annotated triples have showcased a noteworthy ability to handle and query relatively large RDF archives. However, such approaches still suffer from scalability issues, notably at ingestion time. This makes the use of these solutions prohibitive for large revision histories. Furthermore, applications for such systems remain often constrained by their limited querying abilities, where SPARQL is often left out in favor of single triple-pattern queries. In this paper, we propose a hybrid storage approach based on aggregated changesets, snapshots, and multiple delta chains that additionally provides full querying SPARQL on RDF archives. This is done by interfacing our system with a modified SPARQL query engine. We evaluate our system with different snapshot creation strategies on the BEAR benchmark for RDF archives and showcase improvements of up to one order of magnitude in ingestion speed compared to state-of-the-art approaches, while keeping competitive querying performance. Furthermore, we demonstrate our SPARQL query processing capabilities on the BEAR-C variant of BEAR. This is, to the best of our knowledge, the first openly-available endeavor that provides full SPARQL querying on RDF archives.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Edgard Marx submitted on 14/Sep/2025
Suggestion:
Accept
Review Comment:

The authors’ response is of a commendably high standard. The following remarks are offered solely as minor suggestions for further refinement and are not prerequisites for acceptance.

Integration of Performance Discussion: The inclusion of performance comparisons in Section 8.6, provided in response to Reviewer #1, constitutes a valuable addition. To ensure maximal coherence, it would be preferable that this material be seamlessly incorporated into the broader narrative of the discussion, rather than appearing as a supplementary appendix.

The authors have undertaken a rigorous and comprehensive revision of their manuscript, fully engaging with and addressing the concerns raised by all three reviewers. The revisions—including the introduction of a running example, the addition of summary tables for clarity, the substantial expansion of the discussion section, and the elucidation of key algorithms and methodologies—have significantly enhanced the manuscript’s clarity, methodological rigor, and scholarly contribution.

In its revised form, the paper is recommended for acceptance.

Review #2
By Guillermo de Bernardo submitted on 05/Oct/2025
Suggestion:
Accept
Review Comment:

The revised submission addresses all the comments to the previous version. The additions improve the readability of the paper and correct minor issues. In particular, the authors answer the questions I raised in my previous review. Regarding the experimental results, I consider that testing an additional baseline would help in fine-tuning the results and discussion, but the current justifications and analysis are clear enough to put the proposal in context, and therefore to accept the proposal as is.

The topic is relevant, the article is well written and the significance of the results is easier to evaluate. The revised data repository is well structured and documented, and it appears to include the relevant information to reproduce all the experiments in the article.

I suggest acceptance of the paper, but I include below a few (very minor) adjustments that could be considered in the final version:

- The running example is useful to understand the proposal. To make it even clearer, I would suggest to change the explanation in page 6, lines 22-29, to refer to an instance in the running example (triple <:USA, :dr, :Cuba>)

- Fig. 1, G_2: Cube -> Cuba
- p. 16, l. 26: "is only" -> "is the only"