WarSampo Knowledge Graph: Finland in the Second World War as Linked Open Data

Mikko Koho
Esko Ikkala
Petri Leskinen
Minna Tamper
Jouni Tuominen
Eero Hyvonen

Responsible editor: 
Christoph Schlieder

Submission type: 
Dataset Description
The Second World War (WW2) is arguably the most devastating catastrophe of human history, a topic of great interest to not only researchers but the general public. However, data about the Second World War is heterogeneous and distributed in various organizations and countries making it hard to utilize. In order to create aggregated global views of the war, a shared ontology and data infrastructure is needed to harmonize information in various data silos. This makes it possible to share data between publishers and application developers, to support data analysis in Digital Humanities research, and to develop data-driven intelligent applications. As a first step towards these goals, this article presents the WarSampo knowledge graph (KG), a shared semantic infrastructure, and a Linked Open Data (LOD) service for publishing data about WW2, with a focus on Finnish military history. The shared semantic infrastructure is based on the idea of representing war as a spatio-temporal sequence of events that soldiers, military units, and other actors participate in. The used metadata schema is an extension of CIDOC CRM, supplemented by various military history domain ontologies. With an infrastructure containing shared ontologies, maintaining the interlinked data brings upon new challenges, as one change in an ontology can propagate across several datasets that use it. To support sustainability, a repeatable automatic data transformation and linking pipeline has been created for rebuilding the whole WarSampo KG from the individual source datasets. The WarSampo KG is hosted on a data service based on W3C Semantic Web standards and best practices, including content negotiation, SPARQL API, download, automatic documentation, and other services supporting the reuse of the data. The WarSampo KG, a part of the international LOD Cloud and totalling ca. 14 million triples, is in use in nine end-user application views of the WarSampo portal, which has had over 690 000 end users since its opening in 2015.
Review by Laura Pandolfo submitted on 17/Jun/2020
The authors have successfully addressed my suggestions and comments. This work represents a solid dataset description paper and I recommend it for publication.

Anonymous review submitted on 28/Jun/2020
The described dataset is very interesting and the paper is very clear. The authors illustrate the process followed with intuitive figures and all steps of the process are sufficiently described. All key choices made related to the production of this integrated dataset, are well justified. The selected conceptual backbone is a good choice, and I liked the discussion about the event-based conceptual modeling approach (it is informative and fair). Overall, the presentation is very good.