Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Morph-KGC: Scalable Knowledge Graph Materialization with Mapping Partitions

Submitted by Julián Arenas-G... on 03/15/2022 - 09:40

Tracking #: 3090-4304

A new version of this paper is available

Authors:

Julián Arenas-Guerrero

David Chaves-Fraga

Jhon Toledo

María S. Pérez

Oscar Corcho

Responsible editor:

Elena Demidova

Submission type:

Full Paper

Abstract:

Knowledge graphs are often constructed from heterogeneous data sources, using declarative rules that map them to a target ontology and materializing them into RDF. When these data sources are large, the materialization of the entire knowledge graph may be computationally expensive and not suitable for those cases where a rapid materialization is required. In this work, we propose an approach to overcome this limitation, based on the novel concept of mapping partitions. Mapping partitions are defined as groups of mapping rules that generate disjoint subsets of the knowledge graph. Each of these groups can be processed separately, reducing the total amount of memory and execution time required by the materialization process. We have included this optimization in our materialization engine Morph-KGC, and we have evaluated it over three established benchmarks. Our experimental results show that, compared with state-of-the-art techniques, Morph-KGC presents the following advantages: i) it decreases significantly the time required for materialization, ii) it reduces the maximum peak of memory used, and iii) it scales to data sizes that other engines are not capable of processing currently.

Full PDF Version:

swj3090.pdf

Revised Version:

Morph-KGC: Scalable Knowledge Graph Materialization with Mapping Partitions

Previous Version:

Morph-KGC: Scalable Knowledge Graph Materialization with Mapping Partitions

Tags:

Reviewed

Long-term Stable Link to Resources:

https://doi.org/10.5281/zenodo.6282341

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Sherzod Hakimov submitted on 18/Mar/2022

Suggestion:
Accept

Review Comment:

The paper presents a method for materialization of triples that are used for knowledge graph construction. Two different algorithms are presented for mapping partitions, which group mapping rules that create triples. The presented algorithms are evaluated on three benchmark datasets and compared against four existing systems. The experimental results suggest that the presented sequential and parallel processing solutions perform better than the compared systems in terms of memory usage and execution time, respectively.

Originality: the paper addresses an important problem of reducing memory usage and execution time for knowledge graph construction. The novelty aspect of the paper lies in the presented two ways of using mapping partitions: parallel or sequential execution.

Improvement: The authors have taken into account the comments regarding the inclusion of additional configurations of the presented model for some experiments, inclusion of a discussion section to finalize the findings from the presented solution. The authors have shared the resources in Zenodo for the reproducibility.

Overall, the paper is well written with clear structures, after the requested changes from the initial reviews.

Review #2

Anonymous submitted on 07/May/2022

Suggestion:
Minor Revision

Review Comment:

This document reports the second version of the work on planning and execution techniques for [R2]RML mapping rules. The proposed method relies on the partition of mapping rules. Evaluating the groups in the partition, reduces the duplicated generation of RDF triples and maximizes the parallel execution of the mapping rules. The proposed techniques are implemented in MorphKGC, an RML-compliant engine; the behavior of MorphKGC is assessed in three existing benchmarks. The reported results suggest that the proposed methods can accelerate the execution of mapping rules as the ones composing the studied benchmarks.

Overall, this second version addresses some comments on the previous version. The new experimental results provide evidence of the benefits that planning the execution of the mapping rules brings to the process of KG construction. Moreover, the competitive behavior with other engines puts the critical role played by logical and physical planning into perspective. Despite the improvements, this new version of this work still presents imprecise statements which need to be addressed in a new version of the paper.

Definitions:

Definitions 1 and 2 are still not well-formulated; please, reuse the notation and conventions in 2.2. The formal proof of whether given a data source when these two mapping documents will produce the same set of RDF triples is necessary; the correctness of the proposed approach dependents on that.

I do not agree with this statement
“Answer: We have included an additional function const(.) in Section 2.2 to solve this. Note that `if` here does not refer to logics, we have replaced it with `when` to avoid reader confusion. As the values of a term map must be {constant, template or reference}, the invariant is well-defined with the bullets that consider the three possibilities.”
Since “when” and “if” represent logical conditionals, clarify the sufficient and necessary conditions in Definition 5.

Proposed Algorithms:

Please, clearly state the assumptions under which Algorithm 3 is able to generate the Maximal Mapping Partition of an [R2]RML document.

Empirical Evaluation
Please, include absolute values and explain the impact of the selectivity of the joins in the performance of the compared engines.

Minor comments
SDM-RDFizer v4.1.1 and Chimera v2.1. are interpreters of RML and not only parsers. Please, clarify this point.

Log in or register to post comments
2655 reads

Main menu

Editorial Board

Syndicate

Morph-KGC: Scalable Knowledge Graph Materialization with Mapping Partitions

Tracking #: 3090-4304

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Morph-KGC: Scalable Knowledge Graph Materialization with Mapping Partitions

Tracking #: 3090-4304

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles