Creating RESTful APIs over SPARQL endpoints using RAMOSE

Tracking #: 2624-3838

Marilena Daquino
Ivan Heibi
Silvio Peroni
David Shotton2

Responsible editor: 
Armin Haller

Submission type: 
Tool/System Report
Semantic Web technologies are widely used for storing RDF data and making them available on the Web through SPARQL endpoints, queryable using the SPARQL query language. While the use of SPARQL endpoints is strongly supported by Semantic Web experts, it hinders broader use of RDF data by common Web users, engineers and develop-ers unfamiliar with Semantic Web technologies, who normally rely on Web RESTful APIs for querying Web-available data and creating applications over them. To solve this problem, we have developed RAMOSE, a generic tool developed in Python to create REST APIs over SPARQL endpoints, through the creation of textual configuration files which ena-ble the querying of SPARQL endpoints via simple Web RESTful API calls that return either JSON or CSV-formatted data, thus hiding all the intrinsic complexities of SPARQL and RDF from common Web users. We provide evidence that the use of RAMOSE to provide REST API access to RDF data within OpenCitations triplestores is beneficial in terms of the number of queries made by external users to such RDF data using the RAMOSE API compared with the direct access via the SPARQL endpoint. Our findings prove the importance for suppliers of RDF data of having an alter-native API access service, which enables its use by those with no (or little) experience in Semantic Web technologies and the SPARQL query language. RAMOSE can be used both to query any SPARQL endpoint and to query any other Web API, and thus it represents an easy generic technical solution for service providers who wish to create an API ser-vice to access Linked Data stored as RDF in a conventional triplestore.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Victor Charpenay submitted on 30/Nov/2020
Minor Revision
Review Comment:


Review #2
By Jonathan Yu submitted on 10/Dec/2020
Minor Revision
Review Comment:

The paper presents a new software package called RAMOSE for enabling the development of REST APIs as a façade over underlying SPARQL APIs using configuration files. The research question in scope is what is a generic mechanism for enabling web developers and scholars to query RDF data available in triple stores exposed via SPARQL without having to write SPARQL (primarily) via REST APIs), and secondly, how can Semantic Web data providers deploy REST APIs that expose the RDF data in efficiently and easily. Both of these questions are satisfied for by the RAMOSE tool and the authors have presented their novel software application that enables the creation of REST APIs over SPARQL endpoints as a façade using a configuration over code approach. A deployed instance of RAMOSE allows web and application developers to query data, filter and view relevant documentation in a useful way without having to understand or issue SPARQL queries. The importance of tools like RAMOSE comes at a time where a number of RDF datasets are being published, but yet, there are barriers to entry for web developers to these datasets.

The addition of a table to compare RAMOSE and other similar tools is valuable and provides readers with similarities and differences and choices made in RAMOSE.

There are some minor revisions required for the revised manuscript:
- Section 5, para 6: there seems to be a new line in the middle of a sentence that needs fixing
- Section 5, para 8: "clearto" -> "clear to"
- The authors mention they have added a reference to pyldapi in the revised manuscript however, it appears to be missing in the uploaded version.

According to SWJ's impact criteria, the question about RAMOSE's impact remains unresolved in this revision. The authors have argued the case on "potential impact" of a deployment of RAMOSE, i.e. OpenCitations, and the impact of it resulting in client applications being written to query OpenCitations via RAMOSE endpoints. However, in the context of this paper, it is the impact of the tool itself being examined (i.e. RAMOSE), and there has not been additions in the manuscript on this point. That is, impact demonstrated by additional deployments or its uptake in other projects, for example. In terms of outcomes as a result of a deployment, there has been a few deployments but not prolific, though of the examples provided, they are potentially impactful ones (e.g. a Zotero client for OpenCitations). There is no question that tools like RAMOSE have an important contribution to the Semantic Web community, however, on the criteria of impact, the content of this paper lacks enough to fully fulfill this criteria at this moment in time.

Overall, the paper is well written, and prior minor revisions have been mostly addressed. Would suggest that authors consider addressing the issue of impact along with some minor revisions suggested.

Review #3
By Pierre-Antoine Champin submitted on 16/Dec/2020
Review Comment:

Compared to the previous version, the authors have adequately addressed my remarks.

A few typos are left:
* p2 "easily and quickly to provide" → "to easily and quicky provide"
* p3 "Pre-processing and post-processing steps [...] must be specified in any operation" → this is misleading. I think you mean "It must be possible to specify pre-processing and post-processing steps [...] in any operation"
* table 1: what is the point of specifying a method for the whole API??
* sec 3.3.2, last sentence is grammatically strange ("supposing that"??). Plus, the example would be more compelling with a number, because 2<10 while "2">"10". Datetimes in this format have the same order when considered as strings.
* table 2: what do you mean by "to *catch* the value of the parameter"? Don't you mean "match" instead?
* p12: "Despite it is not" → "Despite the fact that it is not"
* p14: "meaningfulfeatures" → "meaningful features"
* p14: "clearto" → "clear to"

A few comments:
* wouldn't it be more robust to *generate* the 'output_json' example (using the example input in 'call'), rather than require the API author to provide it?
* on generating your own documentation rather than using a de facto standard tool such as Swagger, I'm still not convinced. And arguing that you target "non software programmers" is dubious: API users are software programmers.

Review #4
By Sergio Rodriguez Mendez submitted on 17/Dec/2020
Minor Revision
Review Comment:

This manuscript was submitted as 'Tools and Systems Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described tool or system (convincing evidence must be provided). (2) Clarity, illustration, and readability of the describing paper, which shall convey to the reader both the capabilities and the limitations of the tool.

* Summary: the article describes RAMOSE (the "RESTful API Manager Over SPARQL Endpoints"), an open-source generic Python software artifact that allows to create Web RESTful APIs over any SPARQL endpoints by editing a configuration file. It also generates automatically HTML-based documentation and a Web server for testing/monitoring purposes.

* Overall Evaluation (ranging from 0-100):
[Criterion 1]
+ Quality: 95
+ Importance/Relevance: 85
+ Impact: 85
[Criterion 2]
+ Clarity, illustration, and readability: 90
[Criterion 3]
+ Stability: 100
+ Usefulness: 90
+ Impression score: 80 | some design improvements can be made

* Comments:
- The tool could have a better code design (there's room to improve): modularity, parametrization. # future work?
- The structure of the HTML and CSS response chunks for the API documentation/monitoring_dashboard are defined inside the Python script. The design can be improved to offer the developers the possibility to customize the structure of the HTML and CSS response.

* Major:
pag. 05: The last paragraph of 3.1, basically, states that sorting results is an "expensive and time-consuming" operation if performed via SPARQL (over the triplestore engine) than over the RAMOSE middleware (a Python script). Is this correct? Do the authors have evidence of such claims? if so, citation need it!

Also, from the cover-letter of the second submission, the authors mention the following:
"Later in the same section, we explained why we prefer having refinement operations on results rather than inject those directly in the SPARQL query, namely: (1) to allow users to perform such operations on top of a predefined SPARQL query that cannot be modified, and (2) to speed up the retrieval process by means of efficient SPARQL queries and secondly perform basic operations faster on the retrieved JSON data."

(a) Even though the SPARQL query is predefined, the tool allows basic variable/value replacements via "[[...]]". Wouldn't be the same mechanism to manage the SPARQL "ORDER BY" definition clause?
(b) The retrieval process might be faster but, how fast the basic operations are performed on the retrieved data? Do the authors have stats and measured the speed to support these claims?

* Minor corrections:
pag. 11: "Table 4" should be "Table 3".
pag. 11: incorrect link. It should be
pag. 14: "to be as far as possible clearto those who are" --> *clear to*
pag. 14: "web developers" --> *Web*

* Others:
@ / Configuration / Requirements
The link is broken: "Not found!"