Empowering the SDM-RDFizer Tool for Scaling Up to Complex Knowledge Graph Creation Pipelines

Tracking #: 3580-4794

Enrique Iglesias
Maria-Esther Vidal
Diego Collarana Vargas
David Chaves-Fraga

Responsible editor: 
Guest Editors Tools Systems 2022

Submission type: 
Tool/System Report
The significant increase in data volume in recent years has prompted the adoption of knowledge graphs as valuable data structures for integrating diverse data and metadata. However, this surge in data availability has brought to light challenges related to standardization, interoperability, and data quality. Knowledge graph creation faces complexities from large data volumes, data heterogeneity, and high duplicate rates. This work addresses these challenges and proposes data management techniques to scale up the creation of knowledge graphs specified using the RDF Mapping Language (RML). These techniques are integrated into SDM-RDFizer, transforming it into a two-fold solution designed to address the complexities of generating knowledge graphs. Firstly, we introduce a reordering approach for RML triples maps, prioritizing the evaluation of the most selective maps first to reduce memory usage. Secondly, we employ an RDF compression strategy, along with optimized data structures and novel operators, to prevent the generation of duplicate RDF triples and optimize the execution of RML operators. We assess the performance of SDM-RDFizer through established benchmarks. The evaluation showcases the effectiveness of SDM-RDFizer compared to state-of-the-art RML engines, emphasizing the benefits of our techniques. Furthermore, the paper presents real-world projects where SDM-RDFizer has been utilized, providing insights into the advantages of declaratively defining knowledge graphs and efficiently executing these specifications using this engine.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Dominik Tomaszuk submitted on 25/Nov/2023
Review Comment:

I appreciate the authors' efforts in addressing my comments by making necessary adjustments to the paper. Having reviewed the revised version, I find no additional comments to make, and I recommend accepting the paper.

Review #2
Anonymous submitted on 14/Jan/2024
Minor Revision
Review Comment:

The revised version has addressed concerns from my previous review, and is generally acceptable.

Check that all bibliographical sources have complete bibliographical information.
Also, some groups of papers of referenced literature sources are from the same authors and are very similar in content or even the same (e.g. references 15 and 67 are the same?), so one should leave only ones that are unique and sufficient.

Also, section 6 does not appear to have much to do directly with the main research topic of the paper (it is about the tool in general?), and while it is useful to get a general idea about the tool's usage, it can also be moved to appendix.