Abstract
Keywords
Introduction
The Semantic Web is becoming increasingly pivotal across various fields that require managing complex and diverse datasets interoperably. One notable example is bioinformatics, where researchers focus heavily on the principles of Findable, Accessible, Interoperable, Reusable data (Wilkinson et al., 2016) and seamless data integration, both of which are inherently supported by Semantic Web technologies. In practice, this entails publishing data in RDF, annotating it with ontologies and querying it using SPARQL (Bansal et al., 2022; Galgonek & Vondrasek, 2021; Pinero et al., 2020; Rutz et al., 2022; SIB RDF Group Members, 2023; UniProt Consortium, 2021; Zahn-Zabal et al., 2020). In bioinformatics, interoperability is important because gaining insights into a biological problem often necessitates combining data from multiple research domains. Because biological datasets are typically produced by independent teams highly specialized in different fields of interest, it is important to be able to create queries that span multiple datasets.
There are several approaches to spanning multiple RDF datasets. One option is to load the selected datasets, or relevant portions of them, into a single triplestore, making it possible, for example, to transform and augment the original data. Another approach is to transparently split a query into subqueries and dynamically identify the appropriate endpoints for each subquery, as implemented by tools such as FedX (Schwarte et al., 2011) or Comunica (Query Federation with Comunica, 2024). Alternatively, subqueries can be explicitly directed to specific endpoints using the standard SPARQL Federated Query extension (SPARQL 1.1 Federated Query, 2013). The advantages of this approach are that it does not require the use of any additional software and that it provides query developers with a complete overview over query federation. The work presented here focuses on the latter approach and federated query refers exclusively to the use of the service pattern within a SPARQL query.
In the SPARQL Federated Query extension, target endpoints are explicitly denoted by service patterns. Note that these patterns can be nested, resulting in a federated query that can be represented as a
In practice, several pitfalls have been encountered that significantly complicate the use of federated SPARQL queries. First of all, in the event of an error, error messages from nested services are often not propagated to the top level and are effectively swallowed, resulting in uninformative query error responses. Furthermore, when a query takes an unusually long time to execute, it is not clear what the cause is, leaving users without an understanding of the problem. Last but not least, some SPARQL endpoints may silently modify subqueries they delegate to other service endpoints, so these endpoints might then interpret the modified subqueries differently, potentially leading to unexpected results.
To overcome these pitfalls, the debugging tool presented here has been developed to monitor the entire service execution tree of a federated query. The ability to monitor federated queries is crucial for both error detection and performance optimization. Detailed execution data can help identify the specific service pattern responsible for an error, even if it is nested deep within the service execution tree. Moreover, tracing can reveal service patterns that suffer from high latency or are executed too many times. This is often related to execution strategies employed by SPARQL endpoints. For instance, using the
Although general monitoring platforms, such as Datadog (Datadog, 2024), offer alternatives to our debugging tool, they require configuration and deployment on service endpoints, which may not always be feasible. By contrast, tools such as Virtuoso (Virtuoso Universal Server, 2024) and Jena (Apache Jena, 2024) offer detailed information on executions of directly nested services, but they cannot debug the execution within those services. Therefore, despite the importance of this functionality, the tool presented here is, to the best of our knowledge, the first to provide comprehensive tracing across all levels of a service execution tree, regardless of which SPARQL engine is used at each endpoint.
Implementation
The presented tool is provided as a web application
1
intended for SPARQL query developers, designed to encapsulate federated query debugging within an intuitive interface. The application features a custom YASGUI (Rietveld & Hoekstra, 2013) component for query editing, allowing users to work on multiple queries simultaneously using integrated tabs. The debugging process is initiated by pressing the
Concept
It is generally assumed that SPARQL developers cannot feasibly modify service endpoints (e.g. configure them or install additional extensions) and that their interaction with endpoints is limited only to querying via the SPARQL protocol. The debugging tool presented here is therefore implemented with a proxy server at its core. This server intercepts and wraps the execution of each service together with detailed trace information such as the request data, response data, status, etc. When the debugging process is initiated, a query is sent to the proxy server, this execution acts as the root of the entire service execution tree.
An example of query debugging is illustrated in Figure 1. In this example, a query, denoted

Debugging proxy server.
Because all service endpoints are treated as black boxes, interaction with them is only possible through SPARQL requests and responses. To properly trace services, the original service endpoint URLs in the SPARQL query request are substituted with the proxy server’s URL. Additionally, essential information must be encoded into these URLs to ensure that, when services are intercepted by the proxy server, they can be properly traced and executed. The URL of the original service endpoint has to be encoded, allowing the proxy server to know which actual service endpoint to call. Likewise, the ID of its parent in the service execution tree needs to be encoded to identify where the new service execution node should be added in the tree. Lastly, the encoded query ID retains information about the query scope.
To demonstrate the debugging process on a practical example, a federated SPARQL query (Figure 2) from the BioSODA website (Exploring Biological Data Using SPARQL, 2024), which retrieves

Example federated SPARQL query.
The corresponding query service pattern tree and service execution tree are shown in Figures 3 and 4.

Example service pattern tree.

Example service execution tree.
The query is executed at the Oma-browser endpoint
2
. It is processed from top to bottom using the
Consider a scenario where the second service pattern subquery contains another nested service pattern. In such a case, it is impossible to substitute all service endpoints in the query with the proxy server endpoint at once when the query is submitted to the proxy server. Each proxy URL must encode the identifier of the parent in the service execution tree. However, for a nested service pattern, this identifier is only determined after a specific instance of bulk service execution has started at the proxy server. As a result, only the service endpoint URLs at the first level of nesting in the query are initially replaced with proxy URLs. The remaining service endpoint URLs are gradually replaced as the nested service endpoints are invoked through the debugging proxy server, elevating them to the first level of nesting.
Each query can contain more than one directly nested service. In the service execution tree, these executions share the same parent, but it is necessary to distinguish which nested service they belong to in order to create the bulk execution node. This is achieved by encoding a sequential number for each nested service, designated as the
Following the sample BioSODA query, Figure 5 presents an example of a request sent to the root query endpoint by the proxy server, while Figure 6 presents a request sent to the second service. Both requests are generated by the proxy server, with endpoints being enumerated by it.

Request to a root query SPARQL endpoint generated by the debugging proxy server.

Request to a SPARQL endpoint generated by the debugging proxy server.
Consider that during bulk execution, the service execution tree may differ from the service pattern tree. This discrepancy can also occur when service endpoints apply optimizations, such as grouping triple patterns evaluated by the same endpoint, as, for instance, with the
Queries can be traced by the debugging proxy server in parallel. Each execution of a service corresponds to an HTTP request handled by the proxy server. Service execution trees can become large and deeply nested. Moreover, some SPARQL engines can be capable of executing multiple service calls concurrently within the same query execution. Taken together, this creates significant performance pressure on the debugging proxy server, especially when handling parallel executions at scale.
Note that when a service is executed at an endpoint, it is initiated by a corresponding proxy service execution, which waits for the result. Additionally, all preceding nodes in the service execution tree must also wait for their corresponding service executions to complete. As a result, numerous parallel threads are required, many of which may be blocked while awaiting responses. To address this, Java Virtual Threads are utilized, allowing each proxy server request to be handled by its own virtual thread. The benefit of this approach is that virtual threads are lightweight, enabling the system to efficiently manage thousands of them. If virtual threads get blocked, they do not block system threads, ensuring that the system remains both responsive and scalable.
A feature allowing users to cancel queries, which terminates all virtual threads associated with a specific query execution, is also available. Additionally, this feature instructs the proxy server to reject any new proxy calls related to the query being cancelled, even if they are initiated by the original endpoints during service execution after the cancellation has begun.
Frontend
To visualize query execution tracing, an npm package that renders the service execution tree is provided. This package is implemented as an independent React component. The component’s API consists of callbacks receiving a SPARQL query and the endpoint where it should be executed. These parameters are then sent to the proxy server, which initiates the query execution and starts notifying the component about updates to the service execution tree.
After query debugging is started, the service execution tree is rendered in real time. Instead of having the browser poll the server for updates, the proxy server actively pushes tree node changes to the browser, which re-renders only the affected parts of the tree dynamically. This real-time update is achieved using the Server-Sent Events (SSE) protocol (Server-Sent Events, 2024). Compared to the WebSocket protocol (The WebSocket Protocol, 2024), SSE offers a simpler and more efficient solution for this use case, as it requires only one-way communication from the proxy server to the browser, following an initial handshake.
Note that it is impossible to determine when a bulk execution node has fully completed until the parent execution is finished. Therefore, the bulk execution time is only partially calculated during execution. Additionally, it is assumed that executions within the bulk can occur in parallel. As a result, the execution time is calculated as the time interval between the start of the first endpoint call and the completion of the last call within the bulk. Consequently, the displayed execution time continues to increase until it is definitively set when the parent execution node is completed.
Part of the visualized service execution tree, including bulk execution nodes, is shown in Figure 7.

Visualized service execution tree.
The presented approach has been designed so that, during debugging, each query is evaluated in the same way as during direct execution. Nevertheless, the service execution trees may differ. For example, if a service endpoint is declared with the same URL as its parent, the service may bypass a nested service call, executing locally and potentially using internal optimizations. This can lead to discrepancies in the execution structure. In some cases, even the results can differ. Certain SPARQL endpoints, such as Wikidata, enforce a whitelist of allowed service URLs. If the proxy server is not included in this whitelist, service calls made during debugging may result in errors, while the same service call could succeed in a direct query execution without the proxy. This discrepancy occurs because the proxy is treated as an unauthorized endpoint during the debugging process.
Case Study from Practice
The usefulness of the tool in practice is demonstrated on the example of a query that returns no results, even though it is known that some solutions matching the query exist. The query (Figure 8) should return

Retrieval of a list of UniProtKB/Swiss-Prot human proteins that catalyse Rhea reactions involving cholesterol-like compounds.
By tracing the query using the SPARQL debugger, it can be observed that the UniProt endpoint alters the original object term
In our experience, the Uniprot endpoint first tries to use the nested loop join strategy and execute the respective service for different substituted values. However, this approach takes a long time and does not lead to the desired outcome. In subsequent attempts, the endpoint typically choose the option to call the service directly without any substitution, which allows it to get the result in a short time.
These observations, made possible by the SPARQL debugger, demonstrate its practical utility.
The presented software provides SPARQL developers with a debugger tool designed to offer detailed insights into the execution of complex federated queries. It has already proven itself to be effective in practice and has helped identify and resolve several errors and performance issues that were previously impossible to simply address. The proxy server exposes a REST API that enables integration with other applications independently of the debugger frontend.
The source code for both the proxy server 3 and the frontend 4 is available on GitHub. Additionally, a deployment of the SPARQL federated query debugger 5 is ready for use online.
