smart-KG: Partition-Based Linked Data Fragments for Querying Knowledge Graphs

Tracking #: 3571-4785

Amr Azzam
Axel Polleres
Javier D. Fernandez
Maribel Acosta

Responsible editor: 
Cogan Shimizu

Submission type: 
Full Paper
RDF and SPARQL provide a uniform way to publish and query billions of triples in open knowledge graphs (KGs) on the Web. Yet, provisioning of a fast, reliable, and responsive live querying solution for open KGs is still hardly possible through SPARQL endpoints alone: while such endpoints provide a remarkable performance for single queries, they typically can not cope with highly concurrent query workloads by multiple clients. To mitigate this, the Linked Data Fragments (LDF) framework sparked the design of different alternative low-cost interfaces such as Triple Pattern Fragments (TPF), that partially offload the query processing workload to the client side. On the downside, such interfaces still come with the expense of unnecessarily high network load due to the necessary transfer of intermediate results to the client, leading to query performance degradation compared with endpoints. To address this problem, in the present work, we investigate alternative interfaces, refining and extending the original TPF idea, which also aims at reducing server-resource consumption, by shipping query-relevant partitions of KGs from the server to the client. To this end, first, we align formal definitions and notations of the original LDF framework to uniformly present existing LDF implements and such “partition-based” LDF approaches. These novel LDF interfaces retrieve, instead of the exact triples matching a particular query pattern, a subset of pre-materialized, compressed, partitions of the original graph, containing all answers to a query pattern, to be further evaluated on the client side. As a concrete representative of partition-based LDF, we present smart-KG+, extending and refining our prior work [1] in several respects. Our proposed approach is a step forward towards a better-balanced share of the query processing load between clients and servers by shipping graph partitions driven by the structure of RDF graphs to group entities described with the same sets of properties and classes, resulting in significant data transfer reduction. Our experiments demonstrate that the smart-KG+ significantly outperforms existing Web SPARQL interfaces on both pre-existing benchmarks for highly concurrent query execution as well as an accustomed query workload inspired by query logs of existing SPARQL endpoints.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Vojtěch Svátek submitted on 17/Nov/2023
Review Comment:

I have already considered the previous version as acceptable for the journal, and after brief checking of the updates, I have no reason to think else. The paper seems to satisfy all what is expected from a technical paper in SWJ, by far I can judge (not being a deep expert in this topic).

Review #2
By Stasinos Konstantopoulos submitted on 04/Dec/2023
Review Comment:

The submission presents a method for SPARQL query processing that joins the trend for light-weight servers and explores a new point on the trade-off between overloading the servers and transporting too
much data to the client. This is an interesting method and a reasonable path to explore in the TPF/SaGe line of research.

Regarding the potential impact of the work presented, the specific objection I expressed in the first review (i.e., sensitivity to parameterization) has been given appropriate care and the relevant section (Sect 4.2.2) has been improved with concrete details about the rationale for selecting parameters. In any case, my being reluctant about the practical applicability of the system as it currently stands should not stand in the way of publishing the ideas and experimental results, as these can be a useful stepping stone towards a more complete solution.

All my comments from previous reviewing rounds have been addressed.

Review #3
By Antrea Christou submitted on 19/Jan/2024
Review Comment:

Positive Aspects :

The issue of handling multiple queries at once in open knowledge graphs is well explained.
Understanding that issues with the current interfaces require in-depth investigation.
Clear direction is indicated in a well-organized framework throughout the paper.
Thorough background section that includes all necessary information.
Clear and precise wording, a well-organized presentation, and an extensive readme file in the GitHub repository are all praised in the experimental evaluation.

Challenges :

Sentence constructions have been shown to be complex, which can occasionally be difficult for readers.

Comments with regards the first review :

Comments of reviewer #1 were addressed and fixed accordingly.