SPARQL federated query debugging tool

Tracking #: 3849-5063

Authors: 
Marek Moos
Jakub Galgonek

Responsible editor: 
Katja Hose

Submission type: 
Tool/System Report
Abstract: 
Gaining insight into a complex problem often requires combining data from multiple datasets. For this reason, SPARQL query support within a federated environment is an important feature. However, several pitfalls have been encountered in practice, significantly complicating the use of SPARQL queries in such setups. These challenges include uninformative error responses, performance bottlenecks and unintended semantic changes introduced by SPARQL endpoints. To address these pitfalls, this paper introduces a newly implemented SPARQL query debugger, which is available as a web application at https://sparql-debugger.elixir-czech.cz. It has been developed for the purpose of monitoring, in real time, the execution of SPARQL queries that incorporate the service pattern. This monitoring is crucial for error detection and performance optimization. Detailed service execution data (such as SPARQL requests and responses, durations, etc.) can help identify the specific instance of a service responsible for a problem, even if it is deeply nested within the service execution tree. The tool is based on the principle of redirecting all requests to a debugging proxy server, so it can be used with all SPARQL-compliant endpoints without the need for their modification. The debugging tool presented in the paper enables the identification and resolution of issues that are otherwise difficult to address and has proven its effectiveness in practice.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Olaf Hartig submitted on 01/May/2025
Suggestion:
Accept
Review Comment:

I was already okay with the previous version of this manuscript, and the same holds for the revised version now.

The authors have sufficiently addressed each of the few minor issues pointed out in my previous review, and their responses to the reviews of the other reviewers are also reasonable.

Review #2
Anonymous submitted on 05/May/2025
Suggestion:
Accept
Review Comment:

This is a revised version of the paper, which introduces a proxy-based tool for debugging federated SPARQL queries. It intercepts and monitors query execution across multiple SPARQL endpoints. The tool provides real-time tracing, service execution trees, and a web interface. Users can use the tool to diagnose errors, analyze performance, and detect unintended query transformations without modifying endpoints.

As I noted in the original review, the paper correctly identifies major problem points in federated SPARQL query execution: lack of transparency in error reporting, performance issues caused by inefficient query execution strategies, etc. The proxy-based approach is a practical and non-intrusive solution, which allows for interoperability with various SPARQL endpoints. It also provides detailed service execution tracing. The web application provides an intuitive debugging interface, provides the user with service execution trees, etc. Additionally, the authors demonstrate the tool’s effectiveness with a case study, highlighting the ability of the tool to detect and resolve errors caused by endpoint-specific query transformations. All of these are correctly identified, well executed and well described in the manuscript.

In this revised version, the authors have addressed almost all of my concerns and have thoroughly explained the rest in their rebuttal. I think the paper should be accepted for publication in the journal.

Review #3
Anonymous submitted on 06/Jun/2025
Suggestion:
Minor Revision
Review Comment:

Thanks to the authors for addressing most of my concerns. I came back to the answer to my questions.

* Q1: Is it possible to use the tool really as a web proxy without UI ? If yes, how to do that. Can you provide step by step the documentation to do that ?

The debugger proxy server cannot be used as a standard SPARQL Protocol service. But we have developed a new REST API endpoint (/syncquery) that enables synchronous query execution. The response returns the complete execution tree in JSON format, containing the same information as the tree rendered in the frontend. An example of how to use this API is available in our benchmark repository https://github.com/iocbbioinf/sparql_debugger_ benchmark. Step-by-step documentation to deploy the debugger server and REST API description can be found in the README file of the debugger server repository https://github.com/iocbbioinf/sparql_debugger_ server. Please note that, for the setup to work, the debugger server must be accessible via a public URL.

** Well, but in this way, the proxy does not behave as a real transparent HTTP proxy (like NGINX) that could intercept and forward requests without requiring the frontend interface or a specific API client. I think this considerably limits the reuse of the proxy. Is it impossible to design it as a regular transparent proxy ? In figure 1, if you just accept the normal service query, it seems to me that it is possible to transform your “wrapper proxy” into a real transparent http proxy and greatly improve the reuse of your tool.

* Q2: How to do post-processing with your tool ? Can you write a documentation explaining how everything is stored in the proxy and how it can be retrieved by programs ?

Programs can interact with the debugger proxy server via the REST API described in the README of the server repository https://github.com/ iocbbioinf/sparql_debugger_server. All query requests and responses are stored as temporary files, and the query execution tree is maintained in memory.

** Great, it is more clear now.

* Q3: Can you explain how the proxy is built such that another developer can maintain it ?

As mentioned earlier, the proxy server is implemented using the SpringBoot framework in a standard way. Using the Gradle task bootRun, an executable JAR file is created and launched on the local machine (More detailed - README). Users are free to fork the repository and develop their own customized versions.

** Ok,

* Q4: Can tell us who are the current users of the tool and if you have a sustainability plan ?

Until now, we have not collected usage statistics, but we plan to implement this in the future. We also intend to maintain the application over the long term, as it is a service supported within the scope of the Czech National Infrastructure for Biological Data (ELIXIR CZ).

** Don’t you think that supporting a standard SPARQL endpoint interface—at least as a mode of operation—could greatly enhance adoption beyond your UI and enable integration with a broader ecosystem of Semantic Web tools?