Expanding the Virtual Record Treasury of Ireland Knowledge Graph

Tracking #: 3882-5096

Authors: 
Beyza Yaman
Alex Randles
Lucy McKenna
Lynn Kilgallon
Diego Rincon-Yanez
Neil Johnston
Peter Crooks
Declan O'Sullivan

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
The Virtual Record Treasury of Ireland (VRTI) is an online resource which digitally reconstructs the archival collections lost with the destruction of the Public Office of Ireland in 1922. This resource includes a distributed Knowledge Graph (KG), which employs SemanticWeb principles and Linked Data to support discoverability of historical entities across multiple archival collections. CIDOC-CRM serves as the core ontology, extended with bespoke types to address domain-specific needs. The VRTI-KG was deployed successfully in June 2022, at the end of the first phase of the VRTI’s development. The second phase of VRTI-KG development witnessed challenges arising from expanding data sources, advances in technology, an expanding user base, and advanced user requirements. This article focuses on the technical solutions developed to overcome these challenges, which include: developing a robust URI structure; updating and expanding the ontology; uplifting authoritative geospatial data; and creating a map interface to facilitate public engagement. Finally, two bespoke user interfaces were created for the KG; the KG Editor is designed to enable domain experts to interact intuitively with the KG, while the KG Explorer allows public users to navigate seamlessly between historical data in the VRTI-KG and the reconstructed archival records in the VRTI document database.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 06/Jul/2025
Suggestion:
Minor Revision
Review Comment:

The paper is well written and presents a comprehensive account of the VRTI — a long-running project aimed at digitally reconstructing the archival records of the Public Record Office of Ireland, which were lost a century ago. This submission serves as a consolidated and updated version of the project’s developments, including recent improvements in ontology design, URI structuring, geospatial data integration, and user interface development.

I recommend acceptance with minor revisions, subject to the following suggestions for improvement:

Suggestions for Revision

Ontology Diagram for "Person" Class:
Please include a focused ontology diagram that depicts the "Person" class, highlighting only those properties that were added or extended during this phase of the VRTI-KG project. This would help readers quickly understand the semantic enhancements.

Clarify Figure 2’s Purpose (Page 7, Lines 17–19):

On Page 7, lines 17–19, you state: “Fig. 2, enabling historical comparisons by connecting diverse datasets (e.g., historical maps, census records, ...).”
However, Figure 2 does not currently illustrate these connections. Please either revise the figure to reflect these interlinkages or adjust the text accordingly.

Clarify Spatiotemporal Analysis with an Example:
In the Spatiotemporal Analysis section, please elaborate with a concrete example that demonstrates spatiotemporal reasoning across at least three historical eras (i.e., three levels of temporal hierarchy). This would enhance the reader's understanding of how spatiotemporal reasoning is practically applied in the KG.

Clarify Evaluation of the KG Editor (Page 11, Lines 6–7):
You refer to prior evaluation results from paper [19] where participants performed 8 editing tasks with a 77% correctness rate. Please briefly highlight the nature of errors that occurred, and indicate whether those errors have informed subsequent interface improvements.

Explain URI Design Choices:
In the Updated URI Design section, please clarify why country-level identifiers are included in URIs for offices and organizations but not for people or places. A rationale for this asymmetry would improve transparency and consistency.

Instance Summary Table:
For clarity and completeness, please add a summary table listing the total number of unique instances for major entity types in the KG, including persons, places, offices, and organizations. This would give readers a clearer picture of the scope of the final dataset.

Review #2
By Jose Emilio Labra-Gayo submitted on 13/Aug/2025
Suggestion:
Minor Revision
Review Comment:

The paper describes the ongoing work that is being carried on in the Virtual Record Treasury of Ireland to develop a Knowledge Graph. The paper is a continuation of some papers that have been published at workshops on the topic adding new content about the evolution of the knowledge graph mainly to include geospatial information.

Overall, I found the paper well written, easy to follow and interesting. Although it may not contain advanced theoretical concepts, I think it presents a practical use case for knowledge graphs in digital humanities, including geospatial aspects.

About the main dimensions of review for a research contribution.

Originality. As far as I can say the work presented is original. The authors have published some prior work related to this one in 2 workshops, but they cite that work properly and I think it is fine.
Significance of the results: this paper is based more on describing a practical application of semantic technologies within the digital humanities domain more than a new research contribution. The significance is more related with the lessons learnt and the potential benefits for the users of that information. The authors conducted some usability studies which are fine, although it would be great if there were other kind of significant results like some historians that had been able to discover something new using the knowledge graph…but I understand that it can be difficult to provide those results. Maybe, the authors could include some details about the usage of the system…analyzing the logs of the system or the interaction of the users and trying to capture some patterns which could be used to measure the satisfaction of the users. Anyway, I think the authors provide a good description of a real system which is currently deployed and I think it can be enough.
- Quality of writing: I think the paper is well written and the contents are clearly explained including some examples.
- Long-term stable URIs of the resources: The authors include URIs to the deployed system which is currently at: https://kg.virtualtreasury.ie/, although that URI can be considered stable, maybe, they could also include other stable URIs like which could be redirected to that one. I think the authors don’t include references to other resources like the github repo of the knowledge graph source code, or some other data.

Some minor comments or suggestions:
- I think the paper would benefit if it included some description about the general architecture of the system. For example, the authors use Morph-KGC to transform external data, OpenStreetMap, Virtuoso as a triple store, maybe adding a diagram with the architecture and some paragraph discussing the design decisions and alternatives considered would be relevant for readers that would be interested in applying similar solutions to other domains. Although the authors included figure 7 which explains the production and development server, I think a more clean picture could be presented as well.
- The authors indicate that the ontology is built upon E55_Type to classify persons, places, etc. and later indicate that two classifications have been added (era and place types). Are those classifications available in the ontology or in the user interface?
- The use of SKOS can be interesting for searching and navigating, is it available in the user interface?
- I was looking to the ontology available at: https://ont.virtualtreasury.ie/ontology/index-en.html and I found that some of the descriptions contain one number at the end…for example the label for Barony ends with the number 281, in other cases, they end with a URI like Ballyboe which contains the link to www.oed.com/view/Entry/269953, maybe the authors would like to update those descriptions?
- I also noticed the discussion about the design of the URIs which contain a mixture of descriptive labels with opaque ones. I would suggest the authors to relate those decisions with some of the multilingual linked data patterns discussed for example here: https://journals.sagepub.com/doi/10.3233/SW-140136. I noticed the ontology contains language-tagged description in English together with a plain string. Maybe the authors want to justify the reason for that?
- Page 7, “For example, Early Modern Places is a dataset includes a…”
- Page 8, after (Figure 4) and footnote 7, there is an extra whitespace.
- Page 11: I think the following sentences are not grammatically correct: “The participants were asked to completed 8 edit…”, “It is planned to conducted another evaluation on…”
- After reading section 5.1, it is not clear for me how the Knowledge Graph is updated or if there is some mechanism to keep updated the content when more information is found about, for example, some person, or some information has been found incorrect…are there any update policies or mechanisms? This question was raised to me when I read this statement: “...providing access to a write-protected to the Virtuoso triplestore.”, if it is write-protected, then, how can those contents be updated?
- In my opinion the combination of descriptive URIs with opaque ones and specially the use of alphanumeric ones, makes those URIs a bit difficult to read…I would suggest using all numeric IDs like those from Wikidata or more descriptive ones…but those alphanumeric identifiers seem a bit difficult to memorize (I noticed that the authors removed the “l” because otherwise it would be confused with a 1 or the vowels to avoid generating words) but nevertheless, handling those URIs can be a nightmare. Although I understand that it will be difficult to change that decision, maybe provide some justification?
- Subsection “Redirect policy” starts with “An updated redirect policy…”, updated with regards to what?
- Page 14: “...depending on whether the URI is…”
- I think the authors don’t provide a public/open SPARQL endpoint which could be used by potential developers and other applications that wanted to reuse the data portal contents in a programmatic way. Did the authors consider that possibility? Could you justify the pros and cons of that decision?
- In lessons learnt, the authors indicate that SHACL validation helped maintain data integrity when modifying or adding resources…I would suggest the authors to be more ambitious with the use of shapes and offer those shapes as part of the technical documentation of the knowledge graph, which could also be useful for consumers of the SPARQL endpoint…although I think the SPARQL endpoint is not public.
- I am not sure if the URIs employed in the Knowledge Graph follow the linked data principles, I tried to obtain an RDF Turtle representation for Richard Talbot using curl and I didn’t receive any response. I used this command:
curl -H "Accept: text/turtle" -L "https://kg.virtualtreasury.ie/person/Talbot_Richard_c17/v1xf6p1"
In my opinion, a proper knowledge graph in a semantic web context should follow the linked data principles offering RDF…and if it doesn’t at least, in the paper, the authors should indicate why not.

Review #3
Anonymous submitted on 27/Oct/2025
Suggestion:
Major Revision
Review Comment:

The paper discusses incremental results related to the development and extension of the Virtual Record Treasury of Ireland Knowledge Graph (VRTI-KG).
The graph was originally deployed in June 2022.
The paper discusses what Authors describe as technical solutions to overcome challenges in the 2nd phase of the development of the VRTI-KG.
As such, I find the contents of the paper as documenting technical work that has already been developed and deployed.
The paper is well related to the Special Issue by describing a specific "Semantic Web Technology and Application for Cultural Heritage".

As the “Long-term stable URL for resources” Authors link the VRTI Ontology Revision: v1.2 Issued on: 2025-01-24.
In my opinion the description and presentation of the ontology is organized according to the standards.

First, the extensions that are discussed include extensions to the ontology.
Using the description in the text related to the Person Schema Expansion I was not able to trace these extensions in the ontology.
The vocabulary documentation available at https://www.w3id.org/virtual-treasury/vocabulary contains only the concept Era, the Place Types seems to be unavailable.

Two new requirements for the system are described.
The first one is the extension related to the geospatial Linked Data for Irish history.
This aspect seems to be clearly described.
The second is related to Search, Query and Visualisation.
This aspect is described based on the previous works of the Authors.
Next the refactoring of technical infrastructure is described, major part of which is URI redesign.
I found Fig. 8 little informative.

My general impressions about the paper are mixed at this point.
I have no doubt it is relevant for the journal and the SI.
I also found the work truly interesting and important for the CH community.
Furthermore, I understand it might not be easy to capture in a stand-alone paper a progress of a long-going project.
However, for the paper to be easier to follow and self-contained, I very much encourage the Authors to rework it.
It would be most valuable if the projects supporting the initial and the current development phases around the VRTI were briefly introduced with clear timelines given.
Based on the above, the description of the progress and current state given in this paper could be organized around clearly defined requirements of the so-called "2nd phase". These requirements and challenges could be - if needed - categorized into functional, non-functional, technical, organizational, societal, i.e. Based on these, actions and technical solutions could be grounded and justified. Moreover, the main scientific and technical contributions of this paper should be clearly stated and described. This should also lead to restructuring of the paper and improving the narrative, as the current structure of sections and their connections is somehow loose.

Typos and other editorial shortcomings
2:51 domain..Finally,
3:11 (CIDOC-CRM ).
4:45 vocabularly
6:23 listing 2
6:51 ^^ x s d : f l o a t ; (why bf?)
7:23 interlnks
15:45 , or were