HDT crypt: Compression and Encryption of RDF Datasets

Tracking #: 1895-3108

Authors: 
Javier D. Fernandez
Sabrina Kirrane
Axel Polleres
Simon Steyskal

Responsible editor: 
Ruben Verborgh

Submission type: 
Full Paper
Abstract: 
The publication and interchange of RDF datasets online has experienced significant growth in recent years, promoted by different but complementary efforts, such as Linked Open Data, the Web of Things and RDF stream processing systems. However, the current Linked Data infrastructure does not cater for the storage and exchange of sensitive or private data. On the one hand, data publishers need means to limit access to confidential data (e.g. health, financial, personal, or other sensitive data). On the other hand, the infrastructure needs to compress RDF graphs in a manner that minimises the amount of data that is both stored and transferred over the wire. In this paper, we demonstrate how HDT - a compressed serialization format for RDF - can be extended to cater for supporting encryption. We propose a number of different graph partitioning strategies and discuss the benefits and tradeoffs of each approach.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Miel Vander Sande submitted on 12/Jun/2018
Suggestion:
Accept
Review Comment:

I thank the authors for addressing all my comments. Section 4, in particular 4.2, is more understandable, which improves the overall flow IMO.
The evaluation section has also significantly improved with the addition of a newer version of DBpedia (I agree both should be reported) and the addition of the more representative SAFE benchmark, albeit not really illustrating the overall merits of the approach. It makes me wonder which use cases are covered by this work, if not by SAFE. Maybe the authors can still add a statement about that in the final version.
Some additional comments:
- In the Experimental Setup, it is stated that the results for both DBpedia's are comparable. However, the creation time is tripled with double the amount of triples. Of course, you cannot conclude anything from that number of samples, but also not that it scales linearly. Thus, I find it odd HDT creation time is only discussed for LUBM and not for DBpedia and SAFE.
- It would be interesting to know why the larger DBpedia is encryted faster than the smaller one in crypt-C.

Overall, I think this is an excellent paper and is ready for accept. The authors can consider my comments as optional.

Review #2
Anonymous submitted on 13/Aug/2018
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

I am happy with the revision of the paper. In particular, the authors have considered and commented on each
and every comment that I had given in my review.
From my point of view, the paper is ready for publication now.