Scalable Long-term Preservation of Relational Data through SPARQL queries

Tracking #: 554-1760

Silvia Stefanova
Tore Risch1

Responsible editor: 
Christoph Schlieder

Submission type: 
Full Paper
We present an approach for scalable long-term preservation of data stored in relational databases (RDBs) as RDF, implemented in the SAQ (Semantic Archive and Query) system. The proposed approach is suitable for archiving scientific data used in scientific publications where it is desirable to preserve only parts of an RDB, e.g. only data about a specific set of experimental artefacts in the database. With the approach, long-term preservation as RDF of selected parts of a database is specified as an archival query in an extended SPARQL dialect, A-SPARQL. The query processing is based on automatically generating an RDF view of a relational database to archive, called the RD-view. A-SPARQL provides flexible selection of data to be archived in terms of a SPARQL-like query to the RD-view. The result of an archival query is a data archive file containing the RDF-triples representing the relational data content to be preserved. The system also generates a schema archive file where sufficient meta-data are saved to allow the archived database to be fully reconstructed. An archival query usually selects both properties and their values for sets of subjects, which makes the property p in some triple patterns unknown. We call such queries where properties are unknown unbound-property queries. To achieve scalable data preservation and recreation, we propose some query transformation strategies suitable for optimizing unbound-property queries. These query rewriting strategies were implemented and evaluated in a new benchmark for archival queries called ABench. ABench is defined as set of typical A-SPARQL queries archiving selected parts of databases generated by the Berlin benchmark data generator. In experiments, the SAQ optimization strategies were evaluated by measuring the performance of A-SPARQL queries selecting triples for archival in ABench. The performance of equivalent SPARQL queries for related systems was also measured. The results showed that the proposed optimizations substantially improve the query execution time for archival queries.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Nov/2013
Review Comment:

The issues that I brought up have been addressed except for the formalization of the translation rules from A-SPARQL to generate SPARQL. These rules continue to be two english sentences. However, I understand that it is straightforward to do this transformation and I will accept this.

Two minor comments:

Page 5 in the definition of archive specification: archived_triple_patterns -> archived_triple_pattern
Page 6: "UNION of one SPARQL query fragment" --> What is a query fragment? I assume it is a BGP

The correct citation for [18] is
J.F. Sequeda and D.P. Miranker. Ultrawrap: SPARQL execution on relational data. Journal of Web Semantics. Volume 22, October 2013, Pages 19-39