Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Materialisation approaches for Façade-based data access with SPARQL

Submitted by Luigi Asprino on 07/25/2024 - 02:04

Tracking #: 3729-4943

Authors:

Luigi Asprino

Enrico Daga

Justin Dowdy

Aldo Gangemi

Paul Mulholland

Responsible editor:

Raghava Mutharaju

Submission type:

Full Paper

Abstract:

The Knowledge Graph concept is gaining momentum as an ideal approach to data integration. Therefore, it is of paramount importance to equip knowledge engineers with tools for accessing data from multiple, heterogeneous and distributed resources. The successful W3C standard SPARQL is the reference language for interacting with RDF knowledge graphs. For that reason, approaches extend SPARQL for accessing data in non-RDF formats. Recent research proposes relying on an intermediate RDF model, named Façade-X, whose components can be transparently mapped to various file formats. However, although Façade-X specifies how its components map to many different formats (CSV, JSON, HTML, Markdown, and others), it is still unclear how to implement a SPARQL execution engine that relies on it. In other words, what are the possible strategies for executing Façade-X queries? This article explores materialisation approaches for executing Façade-X queries. Specifically, we study two in-memory strategies for performing façade-based data access with SPARQL. A complete materialised view strategy fully transforms the data source into RDF. Instead, a sliced materialised view strategy segments the data source and generates an RDF view on each part. Both strategies can be optimised by only materialising the part of the RDF graph that has potential matches with triple patterns in the query (triple-filtering). In addition, we compare these approaches with an on-disk alternative, which relies on a temporary database instance. We analyse the characteristics of these methods and perform extensive experiments, reporting on benefits and limitations of both approaches. Finally, we contribute guidelines and best practices derived from the findings.

Full PDF Version:

swj3729.pdf

Previous Version:

Materialisation approaches for Façade-based data access with SPARQL

Tags:

Reviewed

Long-term Stable Link to Resources:

https://github.com/SPARQL-Anything/experiments/tree/main/gtfs

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 14/Feb/2025

Suggestion:
Minor Revision

Review Comment:

First of all, I would like to thank the authors for their responses and clarifications. However, they have only partially addressed my concerns, and I am still not fully convinced that the paper presents a research contribution rather than a resource.

- The authors justify excluding RML engines (or other KG construction engines such as SPARQL-Generate or Ontop) from the comparison by stating that the paper focuses on different configurations of Façade. However, their justification is primarily technological and lacks formal definition. If the studied problem differs from other approaches, it would be beneficial to explicitly state those differences. From my perspective, the core problem remains the same as in other methods: converting data into RDF graphs while minimizing execution time and memory consumption. In fact, on page 5, section 1c, line 30, the authors state: "In other words, a façade function takes as input a data source and a query and returns a graph." which closely resembles how (materialized) OBDA is defined.

- The paper lacks explicitly stated research questions. Additionally, I wonder whether a stronger motivation could be provided from a performance perspective rather than a user-oriented one. The authors highlight user concerns in Section 2, but IMHO, the focus of this paper should not be on justifying the use of Façade, but rather on why optimizing execution time and memory consumption is relevant from its perspective.

- I previously requested a clear comparison with prior approaches tackling similar problems, such as Morph-CSV and MapSDI, from a theoretical perspective. Both of these works provide a formal description of their proposals, and I would like to understand the exact differences between them and the approach presented in this paper. To be clear, I am not concerned with the specific mapping language used (RML, R2RML, SPARQL-Anything/Generate, TARQL, etc.), but rather with the main distinctions in methodology and execution.

- The related work section focuses on the technologies used by different engines but does not analyze the underlying approaches and solutions they propose. Furthermore, the paper assumes that OBDA is inherently virtual, but this is not necessarily the case—OBDA can also be materialized. Indeed, Ontop supports materialized OBDA using query rewriting techniques. W3C Direct mapping does not impose any ontology, defines basic rules to convert RDB (or any tabular source with few constraints) into RDF, and is comparable to this proposal.

If this paper aims to be considered a research paper, these points need to be clarified.

Review #2

Anonymous submitted on 24/Mar/2025

Suggestion:
Accept

Review Comment:

The authors have addressed all my comments from Revision 1 by adding two novel subsections (2.1 and 2.3), which provide a detailed motivation for their approach and its relation to other data integration methods. Additionally, Section 5 (Related Work) has been significantly expanded, offering a clearer positioning of the work within the broader field of Knowledge Graph Construction. Overall, the submission has improved considerably compared to the initial version and I believe it to be ready for publication.

Review #3

Anonymous submitted on 28/Sep/2025

Suggestion:
Accept

Review Comment:

(Review submitted by Raghava Mutharaju on behalf of Reviewer #3)

All the comments/questions of Reviewer #3 from the previous round have been adequately addressed/answered in the revised version of the manuscript. So, I recommend accepting the revised manuscript.

However, a few minor issues need to be looked into (listed below). I suggest running a grammar check to catch all the possible issues with the revised text.

1. The terms "triple-filtering" and "triple filtering" have been used. For consistency, use only one form in the manuscript.
2. Page 23, line 11, "future work include" => "future work includes".
3. The caption of Figure 4 refers to Figure 2 when instead it should be Figure 3.
4. Page 23, line 4, "explore possible" => "exploring possible"
5. Page 15, column 2, line 46, "... of at least one-third ...". Is it one-third queries?
6. Page 21, line 47, text goes beyond the column width.

Log in or register to post comments
2267 reads

Main menu

Editorial Board

Syndicate

Materialisation approaches for Façade-based data access with SPARQL

Tracking #: 3729-4943

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Materialisation approaches for Façade-based data access with SPARQL

Tracking #: 3729-4943

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles