The FAIRness of CHeCLOUD, the Cultural Heritage Linked Open Data Cloud

Tracking #: 3879-5093

Authors: 
Antonio Liero
Maria Angela Pellegrino
Gabriele Tuozzo

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
Cultural heritage data, encompassing tangible, intangible, and natural assets as defined by UNESCO, has seen growing digitization efforts globally. While several initiatives and platforms facilitate access to Cultural Heritage content, the Linked Open Data (LOD) ecosystem still lacks a dedicated and curated index for Cultural Heritage datasets and resources. This absence limits the discoverability, accessibility, and reuse of valuable Cultural Heritage data. This study introduces \texttt{CHeCLOUD}—the Cultural Heritage Linked Open Data Cloud—a topical sub-cloud within the broader LOD Cloud, designed to improve the FAIRness (Findability, Accessibility, Interoperability, Reusability) of Cultural Heritage datasets. The goal is to provide a sustainable, centralized reference for Cultural Heritage LOD datasets and assess their quality through a FAIR-aligned lens. The selection of the datasets to be included in \texttt{CHeCLOUD} followed a three-phase methodology inspired by systematic literature review guidelines, adapted for dataset discovery. It includes: (1) structured identification of Cultural Heritage datasets from the LOD Cloud and external sources; (2) FAIRness evaluation using a mapping framework between data quality and linked data principles as well as the KGHeartBeat quality assessment tool; and (3) maintenance, via continuous updates and a feedback mechanism. The methodology ensures transparency, reproducibility, and domain-specific relevance. Besides detailing the proposed resource, datasets are assessed in terms of FAIR-ness by aligning FAIR principles and linked data quality dimensions. As a result, Reusability emerges as the strongest dimension, primarily due to consistent licensing and provenance metadata. Accessibility also scores relatively high, while Findability and Interoperability reveal notable gaps, especially regarding metadata richness, URI dereferenceability, and vocabulary reuse. CHeCLOUD fills a critical gap in the LOD ecosystem by offering, for the first time, a structured, FAIR-aligned index of Cultural Heritage datasets. The findings highlight both the current strengths and areas needing improvement in Cultural Heritage data publication practices. The proposed methodology and assessment framework can be generalized to other domains, supporting broader efforts to enhance data FAIRness across Linked Data ecosystems.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Aug/2025
Suggestion:
Minor Revision
Review Comment:

"The FAIRness of CHeCLOUD, the Cultural Heritage Linked Open Data Cloud" raises a common problem for Cultural Heritage data practitioners, i.e. the absence of centralized repositories to access CH datasets. Additionally, they evaluate the FAIRness of the CHeCLOUD datasets using a methodology inspired by systematic literature review principles for dataset identification and established quality assessment frameworks.
CHeCLOUD indexes 192 datasets and 16 ontologies. Domain-specific sub-clouds exist (Linguistic LOD Cloud, Life Sciences LOD Cloud), but none specifically target CH data. The SLR principles-based methodology for dataset discovery is novel, employing a systematic three-phase approach (identification, assessment, maintenance) that is both systematic and reproducible. The comprehensive FAIRness evaluation is of course novel as well, with the detailed FAIR-to-quality mapping framework (Table 1) representing a methodological contribution that extends beyond the CH domain. The authors provide actionable insights as well: reusability is well-addressed (0.76 mean score) due to good licensing practices, while interoperability (0.50) and findability (0.60) require community improvement. I can personally see the results being useful for both the CH and SW practitioners and the community as a whole - even if for just finding new LOD datasets.

As for clarity, the paper is generally well-structured and comprehensive. The authors put effort in making FAIR principles clear to readers but some sections are overly filled with abbreviations - the tables would be sufficient (for instance, subsection 3.3. LOD Cloud Data Quality Assessment is a bit difficult to read for this). The GitHub repository provides reproducibility of the author's results, and I tested the web application with good results, even finding datasets I was not personally aware of.

This paper makes a valuable contribution by addressing a genuine infrastructure gap in cultural heritage data. The methodology is sound and the results provide actionable insights for the community. The web application is a tool I can foresee being used by the community. The Limitations section self-reports the shortcoming of the CHECLOUD and its over reliance on manual data control.

- Page 6 presents some formatting issues in the table
- Line 123: in the following [section?]
- Line 132: Accessiblity → Accessibility
- Line 137: Interoberability → Interoperability
- Line 305: Fix spacing of footnote 11

Review #2
Anonymous submitted on 22/Dec/2025
Suggestion:
Accept
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Review #3
Anonymous submitted on 10/Feb/2026
Suggestion:
Minor Revision
Review Comment:

Originality. This paper illustrates a rigorous methodology for assessing the quality of Linked Open Data under the lens of the FAIR paradigm, applied to the Cultural Heritage domain through the CHeCLOUD case study, an index of selected datasets from the Cultural Heritage community. While the approach is domain-specific, the proposed framework is explicitly designed to be reusable across other LOD sub-clouds. The analysis highlights key strengths of the Cultural Heritage community, particularly in terms of data reusability and accessibility but at the same time it reveals weaknesses in findability and interoperability. By providing a comprehensive and FAIR-aligned assessment of Cultural Heritage (CH) Linked Open Data, this study positions the CH community in direct comparison with other more established LOD sub-clouds, offering an original cross-domain perspective. Overall, the study offers both a critical assessment and practical insights to improve FAIR data sharing practices in Cultural Heritage.

Significance. This paper effectively highlights the strengths and weaknesses of the Cultural Heritage community in FAIR data sharing, also through a meaningful comparison with other LOD sub-clouds. However, it does not systematically clarify which differences among datasets (such as size, fragmentation, or institutional source) most significantly impact quality outcomes. While the qualitative metrics across FAIR dimensions are clearly presented, a deeper discussion of the underlying factors shaping these scores would strengthen the analysis, for example exploring whether qualitative differences (dataset size) correlate with specific subdomains or types of institutions. Such an analysis would also enhance the policy relevance of the work as a tool for assessing and guiding FAIR data production strategies in the Cultural Heritage domain.

Quality of writing. The paper is well organized and generally easy to read, despite the high level of technical detail. However, some elements currently presented in Section 6, such as the annotation aspects described in Section 6.2, could be introduced earlier to better support the methodological description. In addition, a more explicit discussion of the significance of the proposed methods and results in relation to the ECCH infrastructure currently under development would be highly desirable.
Bibliographic reference links don’t seem to work, which makes the consultation of tables and supporting material sometimes difficult.