Semantic Data Modeling for Dataspaces: The Extensible Culture Information Model and the AP-first Methodology for Application Profile Development

Tracking #: 3909-5123

Authors: 
Rohit A. Deshmukh
Daham M. Mustaf
Georgios Toubekis
Benedikt T. Arnold
Christina Gillmann
Stefan Decker
Christoph Lange

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
The culture domain faces challenges in data interoperability due to heterogeneous data models, lack of standardization, and fragmented datasets. Particularly in the performing arts sector, the provisioning of theater showtimes remains a heterogeneous, labor-intensive, and time-consuming process, which limits the FAIRness (Findability, Accessibility, Interoperability, Reusability) of play schedules (showtimes). This limits the full exploitation of showtimes data, reducing its potential to drive innovative solutions. Consequently, this fragmented and ad-hoc approach negatively impacts occupancy rates, disappoints audiences, and prevents the seamless operation of the sector. Moreover, there is currently no standardized mechanism or infrastructure to protect the rights and sovereignty of cultural institutions and artists. To address these issues, we introduce the Culture Information Model (Culture IM), an extensible, ontology-based framework designed to enable structured data representation and interoperability across performing arts theaters and beyond. Culture IM follows the Semantic Web and Linked Data principles, integrating standards such as Schema.org and DCAT, while also supporting scenario-specific adaptations through application profiles. Its application through the sovereignty-preserving data management infrastructure of dataspaces ensures data interoperability, controlled data access, and compliance with FAIR principles. This paper presents the Culture IM—consisting of building blocks such as ontologies, vocabularies, and application profiles (APs)—and a novel iterative, user-friendly, and application-focused methodology, AP-first, that was followed for its development. We outline the data modeling requirements for Culture IM derived from the German Datenraum Kultur (Culture Dataspace) project, and demonstrate how application profiles support application-specific knowledge representation, particularly for theater showtimes. We selected this use case as a pilot for the extensible Culture IM because our domain-expert partners—a theater association representing multiple German theaters—covered diverse scenarios and embodied the main data modeling and sharing challenges in the culture domain. Furthermore, we explore Culture IM's conceptual integration and potential application within a dataspace architecture, demonstrating its potential for sovereign data exchange. Culture IM provides a modular, scalable, and reusable foundation for digital cultural infrastructures. Future work will extend its application to domains such as museums and music marketplaces, integrate access policy templates, and enhance tool support for non-experts.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 25/Oct/2025
Suggestion:
Minor Revision
Review Comment:

The manuscript is complete, detailed, and well-structured, illustrating the methodology and results of a research project that brings together experts from complementary fields of expertise. The work shows careful documentation of the state-of-the-art on various fronts, both in terms of the cultural domain of reference and semantic web technologies, as well as FAIRness and Open Science practices according to which the research was carried out.
The methodology choices were adequately justified, also thanks to the presentation of possible alternative approaches, whose limitations and advantages were highlighted. Section 6 (on limitations and future work) is extensive and maintains the level of accuracy previously demonstrated, denoting a project organisation that fosters long-term maintainability and the possibility of reusing the research materials and tools.
Overall, the work is detailed and brings a significant scientific contribution to the academic field.

Given the focus on collaboration between domain specialists and semantic web experts, the authors may be interested in considering the work carried out by the Changes project's Spoke 4 (https://www.fondazionechanges.org/en/spoke-4-en/) on the digitisation of the temporary exhibition dedicated to Ulisse Aldrovandi (https://doi.org/10.1016/j.daach.2023.e00309). Within the scope of this project, the Chad-AP application profile (https://link.springer.com/chapter/10.1007/978-3-031-77847-6_11) was developed using SAMOD (“A Simplified Agile Methodology for Ontology Development”, https://dl.acm.org/doi/10.1007/978-3-319-54627-8_5) to model museum metadata and digitisation paradata related to the digital twin of the exhibition.

The overall writing quality is adequate for an academic publication. Below, I highlight some minor pitfalls:
(1) Section 3, “Data Modelling and Data Sharing Infrastructure Requirements”, reiterates the discussion on the shortcomings and needs of cultural institutions already mentioned in previous sections, sometimes getting redundant. Given the amount of material and detailed information presented in the following sections, I would recommend a slight reduction of the space devoted to these descriptions to make the article easier to read in its entirety.
(2)In 4.4.2, in the paragraph from “The fine-grained classes” to “(CGI) in a creative work or during an event”, there are some typos and repetitions to be corrected.
(3)The second-last paragraph of the Conclusion announces to summarise the “three key contributions” and then divides the bullet point list into two separate sentences. Rephrasing this part could improve the ease of comprehension.

Regarding the Long-term stable URL for resources:
- The data file is well-organised: the README file is present, well-structured in sections, includes tables and images, and contains links to relevant external resources.
- The resources are complete and suitable for replicating the research pipeline. The material is released under the Creative Commons Attribution 4.0 International License.
- The chosen repository (GitHub) is compliant with the requirements (appropriate for long-term discoverability) and consistent with the project's scope. As stated in the GitHub repository, the drk-ontology.ttl file regarding the drk-information-model is released on Zenodo (https://zenodo.org/records/15294907).

Review #2
By Jakub Klimek submitted on 20/Nov/2025
Suggestion:
Reject
Review Comment:

The authors of the paper present a methodology for developing application profiles in the context of Culture IM to be used in the Culture dataspace. They demonstrate the usage of the methodology on an application profile for theater showtimes.

First of all, the paper reads more like a resource paper (e.g., description of ontology) than a full research paper, as it primarily introduces a methodology and an information model to be further reused rather than addresses particular research questions. In addition, the paper lacks any evidence of actual adoption or evaluation by users, data providers, or the community.

The contributions are original in the context of the culture dataspace. However, approaches to modeling application profiles outside of the culture dataspace exist that are not mentioned in the related work, for example, the approach taken by the European Commission’s SEMIC initiative with their style guide for core vocabularies and application profiles [1,2], applied in various DCAT-AP profiles.

The significance of the results is currently also limited due to the missing relation to existing related work. Furthermore, the authors work with draw.io for collaborative development of a graphical representation of the application profile. However, when the diagram is finished, semantic web experts need to transform it into machine machine-readable representation manually. However, there are already approaches for automated transformation of draw.io diagrams into OWL, again not mentioned in the related work [3].

It is unclear to me how to read the figures 7-11 showing class names, subclass of relations, and relations, but only via name, not (prefixed) IRI. It is therefore unclear from where the classes are reused or where they are defined. They are split into color-coded modules, some “based on” schema.org (Fig7 caption), described in the text of the paper. This makes it hard to pair the name from the figure to the IRI described in the paper.

Moreover, it is unclear how the application profiles are represented in a machine-readable way. Step 3 on page 20, the methodology mentions that the semantic web expert transforms the visual diagram, using RDF (probably actually RDFS) for the definition of new classes and properties, and SHACL representation for validation. What I am missing is the representation of which existing classes/properties from which existing vocabularies/ontologies are used in the AP? The example SHACL of time schedules in GitHub seems incomplete.

How was the diagram in Figure 12 created? This one is more readable, but it looks different than the previous visual models, and it is unclear how it was created and by whom (in terms of the methodology).

The quality of Figure 13 is low; I recommend redrawing it in a vector-based tool, even though it is taken from another source.

Finally, in the conclusions, the authors plan to create a visual tool for the creation of application profiles. Again, related work addressing a similar challenge is missing [4].

Additional questions:
1. TheatricalProduction - why is it not a class in the ontology in Widoco? This applies to the whole section 3 of the Culture ontology documentation.
2. Why are there inconsistencies in class naming, e.g., theatreEvent vs theatricalEvent?
3. How is the comprehensive general application profile documented? The only documentation linked is the Widoco version of the Culture Ontology
4. The SHACL representation of the theatre showtimes profile seems incomplete; it contains only 1 NodeShape and 3 property shapes (https://github.com/Fraunhofer-FIT-DSAI/drk-information-model/blob/main/a...)

In 4.1, cardinalities are not mentioned as part of the general AP, only the specific AP. Then, in 4.3.2 Step 3, cardinalities are mentioned as part of the general AP. This should be cleared.

As to the online resources, they are published using GitHub, and the repository is well-organized, with a README file with important information. It contains the Culture IM and a brief version of the methodology, controlled vocabularies, etc. The ontology uses PURLs (w3id). However, as mentioned above, I am unsure of the SHACL shapes; they seem incomplete.

For the reasons above, I recommend rejection of the paper, as the necessary improvements are beyond major revision.

Typo: p19 - TRiG => TriG

[1] https://github.com/SEMICeu/style-guide/
[2] https://interoperable-europe.ec.europa.eu/collection/semic-support-centr...
[3] https://2022.eswc-conferences.org/wp-content/uploads/2022/05/paper_90_Ch...
[4] https://ceur-ws.org/Vol-3828/paper33.pdf

Review #3
Anonymous submitted on 30/Jan/2026
Suggestion:
Major Revision
Review Comment:

This paper presents the Culture Information Model, an ontology-based framework to improve FAIRness in the arts performing sector together with the corresponding methodology used behind it. It relies on application profiles and standards such as schema.org and DCAT. The paper is well-written and fits nicely to the scope of the journal.

The use of application profiles is not new; however, a methodology around them is an original contribution of this paper. The building blocks are also useful as they enable users to go step by step, and, if needed, combine them in a different way according to their needs. The use within dataspaces is also an important aspect.

I find that the arts performing and cultural cases are well-developed and well-presented. Artifacts created for the use case at hand are comprehensive and are presented following good practices (e.g., use of tools such as FOOPS! to find issues with the ontologies, and W3ID for identifiers). Artifacts are properly archived and publicly available. Limitations are also discussed.

While the paper takes good care of the cultural domain, the methodology is not that clear, in case someone else wants to apply it for a different use case. In my opinion, some improvements are still needed to make the methodology clearer on its own.

Comments
- The Introduction presents the case scenario and motivates the need of a semantic framework for the arts performing domain, where stakeholders are not necessarily aware of semantic technologies. Multiple aggregator platforms are mentioned. To my knowledge, at least one of them does not fully comply with GDPR. As there could be some private/sensitive/protected data regarding arts performing (mentioned in the Introduction), some discussion around the impact of exposing data in a semantic format, including the GDPR perspective, is missing.
- I find Fig.1 difficult to follow as there is no starting point. A relation between building blocks and the eight requirements listed in the previous section would be a nice way to connect the dots and guide the reader.
- Fig. 1 and building blocks are also related to the methodology created and used in the paper. However, there are some bits also related to methodology later on, e.g., competency questions. The connection is not that clear though.
- If I were to apply the methodology for a different domain, it would not be that straightforward. To me, it is difficult to clearly see what the methodology steps are, isolated from the case scenario in arts performing and cultural domain. I think section 4.3.2 would be the key one for someone willing to use the methodology. However, I also see methodological bits in the rest of section 4 and also in section 3.
- There are different technologies that would be needed for someone to apply the methodology. For instance, SHACL is mentioned a bit in a rush. A section detailing pre-requisites or required background knowledge for someone to follow the methodology could clarify some things.
- Fig 6. uses a variety of icons. What are the licenses for those?

Suggestions
- Not sure if fully relevant but there is another approach to FAIRness and reproducibility in the Data Plant community that also relies on application profiles plus RO-Crates. It is the Annotated Research Context. If there is still room for it, the authors could have a look to see if a combination with RO-Crates could come as future work.

Minor comments
- Once an acronym is defined, e.g., AP for application profile, please use it accordingly.