Review Comment:
Overview:
This paper tackles the challenge of generating Linked Data (LD) descriptions for cultural heritage artifacts. Since manually producing LD annotations from archival records is time-intensive, the authors propose automating the process through an ontology-based information extraction process. Given the scarcity of digitized and annotated archival documents, the study aims at evaluating how NER and RE models trained on general-domain datasets perform when applied to 20th-century archival documents. With a focus on Portuguese Cultural Heritage (CH) documents, they compare the performances of these models on a baseline of contemporary Portuguese documents vs historical documents, covering both OCR-extracted and human-transcribed texts. The results show that the models perform poorly on historical data, highlighting the limited transferability of such models to historical CH archives.
(1) Quality, importance, and impact of the described tool or system
Strengths
* The paper addresses an important challenge in the Cultural Heritage (CH) domain. Extracting structured information from non-born-digital archival records, which are often poorly digitized and lack consistent annotation, is a very common problem for archivists and digital humanities researchers.
* I also appreciated that the study contributes to exploring these tasks in non-English language settings.
* The methodology is clear: the authors present a transparent and reproducible pipeline, detailing dataset creation/adaptation, model training, and evaluation procedures.
* The implementation leverages open-source frameworks and publicly available datasets.
* The accompanying datasets and trained models are openly shared through a stable repository (Figshare), which also includes a README-like description of the content of the repo.
Weaknesses
* Despite its title, the work focuses more on machine learning experiments than on actually presenting a new tool or system. The ontology serves only as a schema for alignment, not as an active tool to guide the extraction. For this reason, I think that the Semantic Web aspect remains only marginal. The paper also does not show how extracted entities and relations are transformed into Linked Data (e.g., RDF or SPARQL-queryable resources).
* Performance results are weak, particularly for RE and for RE/NER on OCR inputs. This suggests that the approach is currently not viable for the intended setting.
* The work does not position itself adequately within the current state of the art. More recent models, such as GLiNER for multilingual entity extraction, Relik for relation and event linking, and LLMs are not discussed or compared against. These models would handle both ontological-aware extractions and some of the challenges related to domain-specific or noisy documents. I believe that this omission weakens the paper’s contribution.
(2) Clarity, illustration, and readability of the describing paper
Strengths
* The paper is clearly written and well organized, following a logical structure.
* Tables and figures effectively support the narrative.
* The authors are honest about limitations (i.e. “limited viability”) and provide a detailed account of both the methods and results, which is commendable in a Tools & Systems Report.
* The discussion section clearly identifies sources of poor performance, including OCR noise and domain mismatch, which helps readers contextualize the results.
Weaknesses
* The abstract and introduction do not fully reflect the actual focus of the work. They frame the paper as a solution for automating LD generation, while the real contribution is an evaluation of models transferability. In the abstract, the authors also overstate the applicability of the system, stating that it successfully ‘identifies core archival entities from textual digital representations’, while results show otherwise.
* The ontology-based aspect of the extraction system is only partially described. Specifically, the paper does not sufficiently explain how ontology concepts/relations are mapped from dataset labels to classes and properties. Also, in both cases the mapping process remains partially manual, which goes against the main scope of the paper.
* While the limitations are acknowledged, a more explicit reflection on how these findings inform future system design would strengthen the conclusion.
* The related work section could be significantly strengthened
(3) Quality of available resources:
The models and datasets are publicly shared via Figshare. The resource package appears well organized, including distinct datasets for training, testing, and evaluation (OCR vs human transcription), and accompanied by metadata and a README that explains the structure. However, the folder lacks the code for exact reproducibility. I encourage the authors to share both the data and the code on GitHub.
(4) Suggestions for future work:
* Explore the integration of state-of-the-art models and LLMs.
* Compare performance with models trained on domain-specific data
* Improve ontological grounding
* Reinforce positioning within the state of the art
Summary:
Overall, the paper addresses common challenges for researchers working in the CH domain, such as noisy and unannotated data. However, the work does not fully align with its claim of presenting a new system. Rather, it constitutes an exploratory study on the transferability of IE models to archival data. While the aim remains meaningful, results show that the approach is not yet viable for real archival applications. Furthermore, the work needs to be better positioned within the current state of the art, and include more recent advances in NER and RE, particularly leveraging transformer architectures and and LLMs.
|