Evaluation of Metadata Representations in RDF stores

Tracking #: 1791-3004

Johannes Frey
Kay Müller
Sebastian Hellmann
Erhard Rahm
Maria-Esther Vidal

Responsible editor: 
Guest Editors Benchmarking Linked Data 2017

Submission type: 
Full Paper
The maintenance and use of metadata such as provenance and time-related information is of increasing importance in the Semantic Web, especially for Big Data applications that work on heterogeneous data from multiple sources and which require high data quality. In an RDF dataset, it is possible to store metadata alongside the actual RDF data and several possible metadata representation models have been proposed. However, there is still no in-depth comparative evaluation of the main representation alternatives on both the conceptual level and the implementation level using different graph backends. In order to help to close this gap, we introduce major use cases and requirements for storing and using diverse kinds of metadata. Based on these requirements, we perform a detailed comparison and benchmark study for different RDF-based metadata representations, including a new approach based on so-called companion properties. The benchmark evaluation considers two datasets and evaluates different representations for three popular RDF stores.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 22/Jan/2018
Review Comment:

All of my comments were taken into account and I am happy with the revisions.

Review #2
By Gabor Bella submitted on 31/Jan/2018
Review Comment:

CONTENT: the authors were not able to provide any additional evaluations with respect to the points raised by the reviewers (for reasons of too long computation times, as I understand). They did extend the related work section significantly and added clarifications throughout the paper.

STRUCTURE: following reviewer comments, the authors have reorganised the paper and have significantly improved its structure. I would have preferred to see even more changes: I still think that it would be more understandable to start from the basic building blocks that are the MRM patterns (current section 4) and move from there to metadata usage (3.1.1), datasets (3.1.2), and then to evaluation criteria. This way it would also become possible to show which dataset is using which MRM pattern. This is just a suggestion, as the authors seem to have their own reasons to insist on the current structure.

LANGUAGE: the English is much better now. One more thing to improve: please replace "deduct" (that means "to subtract") by "deduce" all throughout the paper.

Review #3
By Pavel Smirnov submitted on 18/Mar/2018
Minor Revision
Review Comment:

The paper reflects a comprehensive research in field of RDF-compliant metadata representation models (MRM). Authors compare five MRMs, define use-cases and datasets (WikiData and DBPedia) and conduct performance evaluation of metadata handling capabilities of state-of-art RDF storage engines. The authors position their work as a basis for a future MRM Benchmark as well as raise questions for the future work (e.g. write queries).

The paper is perfectly written while a formatting problem exists in the end (see the sentence “Moreover the ” in 8.1).

An introduction of Companion Properties as a novel MRM in section 4.1.5 might be better to reformulate with a stronger focus. From readers’ point of view, it is unclear, that this novel MRM was just introduced by the authors and this is as one of the contributions if the paper (while it is stated in the introduction and conclusion). Now it looks like any other existing MRMs accompanied with a novelty description. Also a transition from a Singleton Property (in the beginning of 4.1.5) to the Companion property is unclear (via the statement “Since this is a novel MRM…” just after limitations of the privous MRM).