The InTaVia Knowledge Graph – European National Biographical and Cultural Heritage Object Data

Tracking #: 4032-5246

Authors: 
Matthias Schlögl
Jouni Tuominen
Joonas Kesäniemi
Petri Leskinen
Go Sugimoto
Victor de Boer

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Dataset Description
Abstract: 
The InTaVia Knowledge Graph (IKG) is a large Knowledge Graph containing heterogeneous multilingual data from four European national biographies, connected to related cultural heritage objects. This resource provides researchers, heritage professionals, and the informed public access to such biographical information. This paper describes the source data, the data model, the pipeline components for managing and harmonizing the data and the resulting knowledge graph. The data model combines domain standards CIDOC CRM and Bio CRM with elements to represent multiple perspectives on biographical information. The knowledge graph was consolidated from four prosopographical databases (PDBs) and enriched with links to Cultural Heritage Objects (CHOs) from Europeana and Wikidata. The resulting knowledge graph as information about 112,050 persons, described by 257,673 person proxies.In addition to the data model and the data itself, we also describe the infrastructure used to harmonize and maintain this heterogeneous knowledge graph.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Yannis Marketakis submitted on 17/Mar/2026
Suggestion:
Minor Revision
Review Comment:

This paper describes the process towards constructing the InTaVia Knowledge Graph, the results, and the user evaluation. The authors considered all the comments from the previous review. Below you can find some comments from the review of this version.

* Consider updating the abstract so that it describes the motivation and the objectives of their work, rather than the process of the IKG construction.
* In the last paragraph of "Related Work" section, the authors mention that the Amsterdam Museum dataset has been used as a benchmark for various Knowledge Graph Learning methods. It would be good to add references there.
* In the last paragraph of IKG statistics, the authors mention that an unexpected number of 548 sameAs links were found between persons from different datasets. Perhaps that is normal because of the different geographic orientation of the datasets? Is it the same with other entities (e.g., Places)?
* What are the entities that are used for reconciliation and enrichment? From the text, I understand it is applied only to persons. Is it correct? Is it (or can it be) applied to other entities, such as Places, Events?
* Consider increasing the size of Figure 2.b (there is space for enlarging it)

Review #2
Anonymous submitted on 23/Mar/2026
Suggestion:
Accept
Review Comment:

The authors have addressed all the issues I raised in my previous review, so I believe the paper is now ready for publication.
As a minor note, I still think it would have been useful to delve deeper into the presentation of certain modelling aspects, particularly the representation of 'proxies', given their importance to the overall proposal. The authors refer to paper [18] in their reference list as a reference to them. This paper appears to be available only through Arxiv. I’m therefore wondering whether it underwent the review process that is common in research.
In terms of reasoning, I’m wondering whether it is possible to reason over alternative proxies related to the same entity in order to check whether they are mutually consistent. Also, assuming that there may be multiple proxies for a single entity, I’m wondering about the criteria for proxy identity, namely, what features x and y must have to be considered the same (or different) proxies.
If the authors consider these comments relevant, they could add some notes in the final version of the paper.

Review #3
By Michalis Sfakakis submitted on 27/Apr/2026
Suggestion:
Accept
Review Comment:

This is the second review of the revised manuscript on the InTaVia Knowledge Graph (IKG) and the infrastructure developed for its construction and maintenance. The authors have taken my previous comments into account, in particular by providing a more detailed example that illustrates the core model concepts and proxy patterns, as well as by including a preliminary assessment from a workshop with 11 humanities experts. These changes have further improved the presentation of the work. Given that my initial review raised only minor issues, I recommend that the paper is suitable for publication.