Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Sebastián Ferrada
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Krzysztof Janowicz
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
Sanju Tiwari
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Dataspecer: Development of Consistent Semantic Data Specification Ecosystems

Submitted by Stepan Stenchlak on 03/11/2026 - 13:25

Tracking #: 4045-5259

Authors:

Stepan Stenchlak

Jakub Klimek

Petr Skoda

Martin Necasky

Responsible editor:

Oscar Corcho

Submission type:

Tool/System Report

Abstract:

To achieve interoperability for effective data exchange on the web, we need a contract. Depending on the field, we may refer to technical data schemas, web vocabularies, or, more generally, to data specifications (DS). The development and management of these DSes can become difficult in complex domains with multiple stakeholders and related DSes involved. In this paper, we present Dataspecer, an open-source, modular web application for the development of semantic data specifications (SDSes), DSes that target the semantic and technical layers of data exchange. Dataspecer allows users to design web vocabularies and their application profiles, maintaining relations between reused concepts and their original SDSes. Furthermore, Dataspecer assists users in the creation of technical artifacts such as schemas for JSON or XML, while maintaining consistency of the artifacts with the application profiles. We motivate the need for SDSes and derive requirements for such a tool. In case studies based on the ecosystem of DCAT-based specifications, we demonstrate that SDSes created in Dataspecer meet these requirements and are of higher quality. We show SDSes that were created directly in Dataspecer, and in the evaluation section, we argue that using our tool is more efficient than creating them manually, even for smaller domains.

Full PDF Version:

swj4045.pdf

Previous Version:

Dataspecer: Development of Consistent Semantic Data Specification Ecosystems

Tags:

Reviewed

Long-term Stable Link to Resources:

https://github.com/dataspecer/dataspecer

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 10/May/2026

Suggestion:
Accept

Review Comment:

This version of the manuscript adequately fulfills all my previous comments and concerns. Also, this updated manuscript fulfills the requirements established by the journal for the Tools and Systems Report category, since it presents a relevant and well-supported tool, clearly describing its capabilities and limitations with good overall readability, the supplementary resources are well organized, properly documented, and sufficiently complete to support reproducibility and long-term accessibility.

Review #2

By Mario Scrocca submitted on 18/May/2026

Suggestion:
Accept

Review Comment:

Compared to the previously reviewed version, the manuscript is substantially improved in the areas I flagged: (i) terminology/definitions are clearer thanks to an explicit distinction of SDS forms (vocabulary / AP / technical specification) and reference to SEMIC documents, (ii) lifting/lowering mappings are now better described, and most importantly (iii) the productivity/usability evaluation has been rewritten solving the mentioned pitfalls. Also, I appreciate that the authors followed the suggestion to archive the material on Zenodo and implemented changes regarding other "minor issues" (e.g., rewordings, clarification of OWL editing support, figures redrawn). Overall, the paper makes a practically relevant contribution by introducing a publicly available, well-documented, and tested tool that addresses existing challenges in SDS development. I lean toward accepting it with a few minor suggestions.

Terminology
- While the added SDS definition helps, I still see potential confusion for readers coming from "traditional" ontology engineering and not familiar with SEMIC terminology. The term "ontology" appears intermittently (e.g., when citing Corcho et al. in pag. 3) without being explicitly introduced in the paper's own conceptual framing. Given the paper's careful distinction among vocabulary/AP/technical specification, it would help to either (i) define "ontology" early (and explain how it relates to "vocabulary" here), or (ii) systematically avoid "ontology" unless needed, sticking to the paper's SDS terminology.
- Some statements can be read as implying that any ontology reusing another ontology is effectively an application profile. In practice, reuse in ontology ecosystems is often not purely "as-is" (especially considering generic vocabularies like dcterms), and the boundary between "reuse as-is" and "profiling" is not always perceived the same way outside the SEMIC terminology.
- The paper's use of the term "Default Application Profile (DAP)" is clearer than before (and the origin of the term is referenced), but the term can still be confusing in my opinion, and I believe this is also why this is not reported in the SEMIC Style Guide (which has a clearer distinction of the terms ontology/CV/AP). Anyway, I understand this is not the authors' fault, and I have opened an issue to request clarification in the SEMIC Style Guide repository (https://github.com/SEMICeu/style-guide/issues/109).

Evaluation
- The evaluation methodology is much clearer overall, but I suggest rewriting the sentence about the cost model to facilitate the interpretation of tables. For example: "The time measurements were normalised by setting the average task value to 100, thereby defining a relative cost metric. Tasks with values greater than 100 required more cognitive and temporal effort than average, whereas tasks with values below 100 required less."

Minor comments
- The sentence "we distinguish three forms..." in the introduction now improves clarity, but it feels repetitive with the bullet list immediately below. I suggest merging: e.g., keep the three SDS forms as a single bullet list and then state explicitly that Dataspecer covers all three
- Lifting mappings. I understand that since technical specifications in JSON are directly derived from the vocabulary/AP inside Dataspecer, extreme mapping cases are less likely to arise and JSON-LD may be a good fit here. For reference, I was referring to mapping edge cases that are not trivial to address also with RML (see Challenge C2 at https://kg-construct.github.io/workshop/2021/challenges.html).
- Dataspecer section (around p.5): the sentence "can then be generated in the form shown in Figure 2" is confusing. I assume "form" means "set of exported artifacts / representations".
- The paragraph beginning "Focusing solely on individual..." is difficult to read. I suggest splitting it into two shorter sentences and making the claim more direct.

Log in or register to post comments
619 reads

Main menu

Editorial Board

Syndicate

Dataspecer: Development of Consistent Semantic Data Specification Ecosystems

Tracking #: 4045-5259

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Dataspecer: Development of Consistent Semantic Data Specification Ecosystems

Tracking #: 4045-5259

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles