Dataspecer: Development of Consistent Semantic Data Specification Ecosystems

Tracking #: 4045-5259

Authors: 
Stepan Stenchlak
Jakub Klimek
Petr Skoda
Martin Necasky

Responsible editor: 
Oscar Corcho

Submission type: 
Tool/System Report
Abstract: 
To achieve interoperability for effective data exchange on the web, we need a contract. Depending on the field, we may refer to technical data schemas, web vocabularies, or, more generally, to data specifications (DS). The development and management of these DSes can become difficult in complex domains with multiple stakeholders and related DSes involved. In this paper, we present Dataspecer, an open-source, modular web application for the development of semantic data specifications (SDSes), DSes that target the semantic and technical layers of data exchange. Dataspecer allows users to design web vocabularies and their application profiles, maintaining relations between reused concepts and their original SDSes. Furthermore, Dataspecer assists users in the creation of technical artifacts such as schemas for JSON or XML, while maintaining consistency of the artifacts with the application profiles. We motivate the need for SDSes and derive requirements for such a tool. In case studies based on the ecosystem of DCAT-based specifications, we demonstrate that SDSes created in Dataspecer meet these requirements and are of higher quality. We show SDSes that were created directly in Dataspecer, and in the evaluation section, we argue that using our tool is more efficient than creating them manually, even for smaller domains.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/May/2026
Suggestion:
Accept
Review Comment:

This version of the manuscript adequately fulfills all my previous comments and concerns. Also, this updated manuscript fulfills the requirements established by the journal for the Tools and Systems Report category, since it presents a relevant and well-supported tool, clearly describing its capabilities and limitations with good overall readability, the supplementary resources are well organized, properly documented, and sufficiently complete to support reproducibility and long-term accessibility.

Review #2
By Mario Scrocca submitted on 18/May/2026
Suggestion:
Accept
Review Comment:

Compared to the previously reviewed version, the manuscript is substantially improved in the areas I flagged: (i) terminology/definitions are clearer thanks to an explicit distinction of SDS forms (vocabulary / AP / technical specification) and reference to SEMIC documents, (ii) lifting/lowering mappings are now better described, and most importantly (iii) the productivity/usability evaluation has been rewritten solving the mentioned pitfalls. Also, I appreciate that the authors followed the suggestion to archive the material on Zenodo and implemented changes regarding other "minor issues" (e.g., rewordings, clarification of OWL editing support, figures redrawn). Overall, the paper makes a practically relevant contribution by introducing a publicly available, well-documented, and tested tool that addresses existing challenges in SDS development. I lean toward accepting it with a few minor suggestions.

Terminology
- While the added SDS definition helps, I still see potential confusion for readers coming from "traditional" ontology engineering and not familiar with SEMIC terminology. The term "ontology" appears intermittently (e.g., when citing Corcho et al. in pag. 3) without being explicitly introduced in the paper's own conceptual framing. Given the paper's careful distinction among vocabulary/AP/technical specification, it would help to either (i) define "ontology" early (and explain how it relates to "vocabulary" here), or (ii) systematically avoid "ontology" unless needed, sticking to the paper's SDS terminology.
- Some statements can be read as implying that any ontology reusing another ontology is effectively an application profile. In practice, reuse in ontology ecosystems is often not purely "as-is" (especially considering generic vocabularies like dcterms), and the boundary between "reuse as-is" and "profiling" is not always perceived the same way outside the SEMIC terminology.
- The paper's use of the term "Default Application Profile (DAP)" is clearer than before (and the origin of the term is referenced), but the term can still be confusing in my opinion, and I believe this is also why this is not reported in the SEMIC Style Guide (which has a clearer distinction of the terms ontology/CV/AP). Anyway, I understand this is not the authors' fault, and I have opened an issue to request clarification in the SEMIC Style Guide repository (https://github.com/SEMICeu/style-guide/issues/109).

Evaluation
- The evaluation methodology is much clearer overall, but I suggest rewriting the sentence about the cost model to facilitate the interpretation of tables. For example: "The time measurements were normalised by setting the average task value to 100, thereby defining a relative cost metric. Tasks with values greater than 100 required more cognitive and temporal effort than average, whereas tasks with values below 100 required less."

Minor comments
- The sentence "we distinguish three forms..." in the introduction now improves clarity, but it feels repetitive with the bullet list immediately below. I suggest merging: e.g., keep the three SDS forms as a single bullet list and then state explicitly that Dataspecer covers all three
- Lifting mappings. I understand that since technical specifications in JSON are directly derived from the vocabulary/AP inside Dataspecer, extreme mapping cases are less likely to arise and JSON-LD may be a good fit here. For reference, I was referring to mapping edge cases that are not trivial to address also with RML (see Challenge C2 at https://kg-construct.github.io/workshop/2021/challenges.html).
- Dataspecer section (around p.5): the sentence "can then be generated in the form shown in Figure 2" is confusing. I assume "form" means "set of exported artifacts / representations".
- The paragraph beginning "Focusing solely on individual..." is difficult to read. I suggest splitting it into two shorter sentences and making the claim more direct.