Digital Humanities on the Semantic Web: Sampo Model and Portal Series

Eero Hyvonen

Cultural heritage (CH) contents are typically strongly interlinked, but published in heterogeneous, distributed local data silos, making it difficult to utilize the data on a global level. Furthermore, the content is usually available only for humans to read, and not as data for Digital Humanities (DH) analyses and application development. This application report addresses these problems by presenting a collaborative publication model for CH Linked Data and six design principles for creating shared data services and semantic portals for DH research and applications. This Sampo model has evolved gradually in 2002--2021 through lessons learned when developing the Sampo series of linked data services and semantic portals in use, including MuseumFinland (2004), CultureSampo (2009), BookSampo (2011), WarSampo (2015), Norssit Alumni (2017), U.S. Congress Prosopographer (2018), NameSampo (2019), BiographySampo (2019), WarVictimSampo 1914--1922 (2019), MMM (2020), AcademySampo (2021), FindSampo (2021), and WarMemoirSampo (2021). These Semantic Web applications surveyed in this paper cover a wide range of application domains in CH and have attracted up to millions of users on the Semantic Web, suggesting feasibility of the proposed Sampo model. This work shows a shift of focus in research on CH semantic portals from data aggregation and exploration systems (1. generation systems) to systems supporting DH research (2. generation systems) with data analytic tools, and finally to automatic knowledge discovery and Artificial Intelligence (3. generation systems).
This is a joint review of Kai Eckert and Benjamin Schnabel. Benjamin Schnabel is a PhD student in the field of Digital Humanities (Jewish Studies). This revised version of the paper has been improved a lot over the previous version. All considerations have been taken into account. We therefore would recommend to accept the paper for publication.

The author has submitted a thoroughly revised version of the original article, which all the minor issues I had pointed out.

In particular, the description of the Sampo design principles has been improved considerably by adding futher explanations and by changing the order of presentation. More detail is given on the technology generations and it becomes much clearer how they relate to the design principles. Finally, the SemanticComputing Github, which provides the long-term stable link to the resources, has been updated as asked for to meet the journal's Open Science Data policy.

I recommend the manuscript for publication.