Review Comment:
This paper presents Dataspecer, a modular, open‑source tool for the authoring, management, and publication of Semantic Data Specifications (SDSs). The system covers vocabulary creation, application profile (AP) design, and the definition of technical schemas in other formats (like JSON and XML).
1. Quality, Importance, and Impact
The paper addresses a highly relevant and timely problem. The number of application profiles built atop standard RDF vocabularies (e.g., DCAT, DCAT‑AP and its national/domain-specific variants) continues to grow rapidly, and maintaining consistency among these interconnected specifications remains extremely challenging. As someone with practical experience working on such specifications, I found the motivations compelling and the tool useful. In particular, the explicit management of reuse, profiling hierarchies, and cross‑context refinements responds directly to recurring difficulties in such projects.
The tool offers clear and practical value, especially through: (i) the structured support for semantic reuse and its machine‑readable serialisation using the DSV vocabulary, (ii) the hierarchical views of specifications also in the automatically generated artefacts (cf. Figure 13), (iii) the support for technical schemas semantically bound to the SDS. The authors also provide comparative tables and a rich related work section that clearly position Dataspecer in the ecosystem, highlighting similarities and differences w.r.t. exisiting solutions.
The impact is convincingly demonstrated through multiple real-world usages of the tool across different use cases (Czech FOS specifications, DCAT‑AP‑CZ, and EOSC‑CZ research data repositories) and the tool online presence. Having over 30 GitHub stars is a positive sign for a specialized tool, and the availability of a website, documentation, and a demo instance supports broader adoption.
Regarding limitations, I am not fully convinced by the proposed approach for transforming technical schemas to RDF (lifting mappings) and it is unclear why the system does not consider declarative mapping languages such as RML. In my experience, mapping JSON files only via JSON‑LD context may fall short (e.g., when transforming deeply nested or recursive structures). Also, lowering mappings are mentioned (from RDF to other formats) but it is not clear if it is a feature of the tool (e.g., Figure 3 mentions only mappings FROM technical schemas TO semantic level).
2. Clarity, Illustration, and Readability
The paper is generally clearly written, well‑structured, and supported by informative figures and examples. However, terminology consistency needs improvement and I would suggest better stating some "definitions" to avoid confusion in the reading of the paper:
- (web) vocabulary vs semantic data specification vs application profile vs profile: all these terms are used without clarifying which are the differences among them? I found some clarifications by looking at the DSV vocabulary, but I believe it is important to mention this also in this paper to make it self-contained.
- Referring to DCAT as a default application profile (DAP) is confusing in my opinion (and, as the authors acknowledge, also for users in the evaluation). I would stick to the SEMIC definitions classifying DCAT as a Core Vocabulary (https://semiceu.github.io/style-guide/1.0.0/terminological-clarification...). If the authors chose a different terminology in the context of the tool, this should be explicitly justified, and the relationship to existing definitions clearly explained.
The evaluation section should be improved, especially the section on "productivity and usability".
In the DCAT‑AP evaluation, the paper effectively highlights shortcomings of current AP representations (e.g., the limitations of relying solely on SHACL to represent certain intended usage), but does not always clearly explain how Dataspecer addresses these issues.
Regarding productivity and usability, I appreciate the effort to provide such an evaluation, but I find the presentation confusing. The authors present five use cases but then state they cannot be addressed in the evaluation; only after several paragraphs is it explained that the idea is to use a "cost" model based on smaller tasks to make an estimate. Furthermore, the cost computation is not sufficiently explained, making it difficult to interpret the values in Tables 2, 3, and 4. The statement that participants were asked to "assume expertise in relevant technologies" is also ambiguous: it is unclear whether participants were actual experts, or if the evaluation disregarded the time needed to gain expertise in Semantic Web technologies, the domain, or the tool itself. The paper mentions that many users were already Dataspecer users, but does not specify how many, nor does it clarify what users had to learn, especially in light of the later comment on Q10. These points should be clarified to strengthen the evaluation.
Minor issues:
- In the introduction, the long sentence starting with “The reuse increases the likelihood…” is difficult to understand.
- The bulleted list of Dataspecer features in the Introduction is not so easy to understand at that point of the paper (it becomes clearer later, after introducing the three use cases addressed by the tool).
- Figures 2–3 could better distinguish artefacts directly used by the tool (e.g., based on DSV or custom format interpreted by the tool) from exportable artifacts
- Page 7: it is stated that the result can include RDFS+OWL. It should be clarified whether Dataspecer allows editing OWL axioms
- Page 10: when stating that “dct:title has a different title when reused in DCAT,” consider using “label” instead of “title”
- Page 13: clarify the sentence "(ii) evaluating the creation of APs would be uninformative because our premise is..."
- For long-term reference and citation, I recommend providing DOI-based versioned releases via Zenodo in addition to the GitHub codebase URL.
To summarise:
+ Reusable tool addressing very relevant use cases and requirements
+ Related work section very complete comparison and cross-tool assessment of functionalities
+ Good cross-referencing of requirements and tools features with concrete implementations
- Terminology used can be better explained and aligned with SEMIC definitions
- Evaluation section, especially the one about usability is not clear and should be improved
- Lifting/lowering mappings to be better explained and positioned w.r.t. the literature
|