Materialisation approaches for Façade-based data access with SPARQL

Tracking #: 3729-4943

Authors: 
Luigi Asprino
Enrico Daga
Justin Dowdy
Aldo Gangemi
Paul Mulholland

Responsible editor: 
Raghava Mutharaju

Submission type: 
Full Paper
Abstract: 
The Knowledge Graph concept is gaining momentum as an ideal approach to data integration. Therefore, it is of paramount importance to equip knowledge engineers with tools for accessing data from multiple, heterogeneous and distributed resources. The successful W3C standard SPARQL is the reference language for interacting with RDF knowledge graphs. For that reason, approaches extend SPARQL for accessing data in non-RDF formats. Recent research proposes relying on an intermediate RDF model, named Façade-X, whose components can be transparently mapped to various file formats. However, although Façade-X specifies how its components map to many different formats (CSV, JSON, HTML, Markdown, and others), it is still unclear how to implement a SPARQL execution engine that relies on it. In other words, what are the possible strategies for executing Façade-X queries? This article explores materialisation approaches for executing Façade-X queries. Specifically, we study two in-memory strategies for performing façade-based data access with SPARQL. A complete materialised view strategy fully transforms the data source into RDF. Instead, a sliced materialised view strategy segments the data source and generates an RDF view on each part. Both strategies can be optimised by only materialising the part of the RDF graph that has potential matches with triple patterns in the query (triple-filtering). In addition, we compare these approaches with an on-disk alternative, which relies on a temporary database instance. We analyse the characteristics of these methods and perform extensive experiments, reporting on benefits and limitations of both approaches. Finally, we contribute guidelines and best practices derived from the findings.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Feb/2025
Suggestion:
Minor Revision
Review Comment:

First of all, I would like to thank the authors for their responses and clarifications. However, they have only partially addressed my concerns, and I am still not fully convinced that the paper presents a research contribution rather than a resource.

- The authors justify excluding RML engines (or other KG construction engines such as SPARQL-Generate or Ontop) from the comparison by stating that the paper focuses on different configurations of Façade. However, their justification is primarily technological and lacks formal definition. If the studied problem differs from other approaches, it would be beneficial to explicitly state those differences. From my perspective, the core problem remains the same as in other methods: converting data into RDF graphs while minimizing execution time and memory consumption. In fact, on page 5, section 1c, line 30, the authors state: "In other words, a façade function takes as input a data source and a query and returns a graph." which closely resembles how (materialized) OBDA is defined.

- The paper lacks explicitly stated research questions. Additionally, I wonder whether a stronger motivation could be provided from a performance perspective rather than a user-oriented one. The authors highlight user concerns in Section 2, but IMHO, the focus of this paper should not be on justifying the use of Façade, but rather on why optimizing execution time and memory consumption is relevant from its perspective.

- I previously requested a clear comparison with prior approaches tackling similar problems, such as Morph-CSV and MapSDI, from a theoretical perspective. Both of these works provide a formal description of their proposals, and I would like to understand the exact differences between them and the approach presented in this paper. To be clear, I am not concerned with the specific mapping language used (RML, R2RML, SPARQL-Anything/Generate, TARQL, etc.), but rather with the main distinctions in methodology and execution.

- The related work section focuses on the technologies used by different engines but does not analyze the underlying approaches and solutions they propose. Furthermore, the paper assumes that OBDA is inherently virtual, but this is not necessarily the case—OBDA can also be materialized. Indeed, Ontop supports materialized OBDA using query rewriting techniques. W3C Direct mapping does not impose any ontology, defines basic rules to convert RDB (or any tabular source with few constraints) into RDF, and is comparable to this proposal.

If this paper aims to be considered a research paper, these points need to be clarified.

Review #2
Anonymous submitted on 24/Mar/2025
Suggestion:
Accept
Review Comment:

The authors have addressed all my comments from Revision 1 by adding two novel subsections (2.1 and 2.3), which provide a detailed motivation for their approach and its relation to other data integration methods. Additionally, Section 5 (Related Work) has been significantly expanded, offering a clearer positioning of the work within the broader field of Knowledge Graph Construction. Overall, the submission has improved considerably compared to the initial version and I believe it to be ready for publication.

Review #3
Anonymous submitted on 28/Sep/2025
Suggestion:
Accept
Review Comment:

(Review submitted by Raghava Mutharaju on behalf of Reviewer #3)

All the comments/questions of Reviewer #3 from the previous round have been adequately addressed/answered in the revised version of the manuscript. So, I recommend accepting the revised manuscript.

However, a few minor issues need to be looked into (listed below). I suggest running a grammar check to catch all the possible issues with the revised text.

1. The terms "triple-filtering" and "triple filtering" have been used. For consistency, use only one form in the manuscript.
2. Page 23, line 11, "future work include" => "future work includes".
3. The caption of Figure 4 refers to Figure 2 when instead it should be Figure 3.
4. Page 23, line 4, "explore possible" => "exploring possible"
5. Page 15, column 2, line 46, "... of at least one-third ...". Is it one-third queries?
6. Page 21, line 47, text goes beyond the column width.