Using LLMs for Semantic Alignment: A Study on Archival Metadata Description

Tracking #: 4093-5307

Authors: 
Maria Ioanna Maratsi
Charalampos Alexopoulos
Yannis Charalabidis

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
The advantages of aligning custom data schemas with standardised ontologies within their respective knowledge domain have long since been proven in practice. Sharing a common structural representation by mapping concepts and relationships between the schemas is essential to ensure data interoperability (especially on a semantic level), integration, reuse, and the ability to leverage machine-processable and advanced-search capabilities. Archival institutions preserve, manage, and provide access to large amounts of diverse cultural and historical data, demonstrating a high potential to be active contributors to a global knowledge network, should archival data be transformed and offered as linked (open) data. Based on the expert-validated dataset of the alignment (mapping) of the Swedish National Archives schema to the Records-in-Contexts (RiC-O) ontology, the purpose of this study is two-fold. First, to examine whether it is possible to automatically and effectively extend one case (Sweden) to other archival institutions and align new custom schemas to RiC-O, given an expert-curated dataset of this domain. Secondly, using the aforementioned dataset and one more of a few human-evaluated examples of mapping to other cultural heritage ontologies as input, to examine whether an LLM (e.g., GPT-4o) can recommend meaningful alignments for enhanced metadata description to more ontologies within the same domain (CH and archives), but also across other domains. The experiments reveal several challenges and shortcomings of the LLM prompting approach for these tasks, but also possible opportunities to leverage towards this direction.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 13/Jun/2026
Suggestion:
Accept
Review Comment:

In the new version, the authors have addressed all points in the previous round of reviews, better positioning their work within the relevant literature, adding further details about the evaluation process, presenting a qualitative error analysis, and identifying some further limitations of their work.

I therefore recommend the acceptance of the paper, but suggest the authors make the following minor amendments to improve clarity:

- On page 3, add appropriate references or links for AML and LogMap
- On page 8, start a new sentence after "do exist"
- On page 14, change "human-evaluated" to "evaluated"
- On page 23, the sentence "Structural errors refer to ..." is not clear. The example you present does not seem to be a wrong-domain-range relation error, but a concept mapped incorrectly to a property rather than a class.
- The following sentence ("Hallucinated elements ...") could also be better clarified: The authors could simply explain that the LLM created a mapping to an entity or relationships that do not exist.
- In the following sentence, consider modifying "could be related but not of the equivalent level of granularity" to "is relevant but not at the same level of specificity"
- On page 27, consider starting a new sentence after "with prompt-based LLM reasoning"
- On page 28, consider modifying "presented confusion matrix ones of this study" to "confusion matrix metrics used in this study"
- On page 28, consider replacing "quality" (last word of Section 5) with "process"