Data-driven Assessment of Structural Evolution of RDF Graphs

Tracking #: 2254-3467

Carlos Bobed
Pierre Maillot
Peggy Cellier
Sébastien Ferré1

Responsible editor: 
Claudia d'Amato

Submission type: 
Full Paper
Since the birth of the Semantic Web, numerous knowledge bases have appeared. The applications that exploit them rely on the quality of their data through time. In this regard, one of the main dimensions of data quality is conformance to the expected usage of the vocabulary. However, the vocabulary usage (i.e., how classes and properties are actually populated) can vary from one base to another. Moreover, through time, such usage can evolve within a base and diverges from the previous practices. Methods have been proposed to follow the evolution of a knowledge base by the observation of the changes of their intentional schema (or ontology); however, they do not capture the evolution of their actual data, which can vary greatly in practice. In this paper, we propose a data-driven approach to assess the global evolution of vocabulary usage in large RDF graphs. Our proposal relies on two structural measures defined at different granularities (dataset vs update), which are based on pattern mining techniques. We have performed a thorough experimentation which shows that our approach is scalable, and can capture structural evolution through time of both synthetic (LUBM) and real knowledge bases (different snapshots and updates of DBpedia).
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 11/Aug/2019
Review Comment:

The authors addressed all my comments. With the new version all my concerns have been resolved and I recommend the article to be accepted for publication.

The authors added a comparison to one related approach, i.e., RDF2vec, which strengthens the claims. Maybe the authors could add more details in the discussion of this approach compared to their approach, at the end of section 6.2.4.
Furthermore, the authors provide more details about the related approaches and how the proposed approach differs from them.
The authors added section 5.3. which describes 3 real-world applications of their proposed approach, which furthermore strengthens the paper.

Review #2
By Ilaria Tiddi submitted on 04/Sep/2019
Review Comment:

I am quite happy with the author's new version. All my comments were thoroughly addressed and so were the one's of the others' reviewers (at least to my eyes).

I think the new edits (particularly use-cases and additional evaluations) significantly improve the paper, which I now recommend for acceptance.

Review #3
By Jedrzej Potoniec submitted on 09/Sep/2019
Review Comment:

The paper is a revision of an earlier manuscript and my high opinion about its originality holds. The experimental evaluation and the quality of writing substantially improved. In particular, new experiment comparing with rdf2vec was introduced and the results are favorable to the proposed approach.

Quality of writing is very good now and the paper reads well. In particular, missing explanations were added and confusing shortcuts removed. I thank the authors for all the effort the put in addressing my remarks, throughout the paper and in the cover letter.

Overall, I am satisfied with the current shape of the paper and find it suitable for publication.