How to deal with massively heterogeneous cultural heritage data – lessons learned in CultureSampo

Paper Title: 
How to deal with massively heterogeneous cultural heritage data – lessons learned in CultureSampo
Eetu Mäkelä, Eero Hyvönen, and Tuukka Ruotsalo
This paper presents the CultureSampo system for publishing heterogeneous linked data as a service. Discussed are the problems of converting legacy data into linked data, as well as the challenge of making the massively heterogeneous yet interlinked cultural heritage content interoperable on a semantic level. Novel user interface concepts for then utilizing the content are also presented. In the approach described, the data is published not only for human use, but also as intelligent services for other computer systems that can then provide interfaces of their own for the linked data. As a concrete use case of using CultureSampo as a service, the BookSampo system for publishing Finnish fiction literature on the semantic web is presented.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 

Resubmission accepted after the changes have been reviewed by the editors.

Review 1 by Werner Kuhn

This is an excellent and very interesting paper, perfectly fitting the issue. It stands out by being very strongly based in real world data that are far from trivial, by discussing longer term experiences, and by offering its very valuable and most interesting lessons learned. Particularly the observation on the user level difficulties with event-based ontologies is very important and should lead to interesting new research challenges. But many other lessons are equally (or even more) valuable, covering many issues from data modeling through reasoning to user interaction.

The only overall suggestion I have is to address the issue of large (or huge) data quantities more explicitly, in particular in the latter parts of the paper (regarding reasoning and services).

Another, less important point is your claim to have used "language-independent" ontologies. I am not sure what you mean by this, as vocabularies are per definition language-dependent. At least at their leaves, your ontologies are language dependent. More importantly, I would assume that Finish and Swedish, for example, hold some rather different conceptualizations of the world, and it would be good to state, at least, whether you have faced any of these, or not yet paid attention to them.

The first part of the paper contains several typos (arcihives, catalogueing, variery) and on p.5 it should say manufacture*d*.
The phrase on p.11 starting with "shift focus from object location...." is not very clear to me.
At the end of p.19, a sentence starts with "But.." and then continues with "however", but these are redundant however ;-).

Review 2 by Andreas Andreou

This is a very well written paper presenting how to deal with heterogeneous cultural heritage data through the CultureSampo system. This is clearly the description of the end result of a research funded project. Overall the paper stands strong, with the only flaw being the inability to identify how this paper differs from other studies publsihed by the same group or from similar work. For example, references 21, 22, 23, 24 describe pretty much similar parts of the present paper. How do these studies differ from this paper? What is the new elements reported here compared to those articles? This should be clearly addressed and explained by the authors. There are some typing erros here and there (e.g. p.5, left column, 3 lines from the bottm, the word "one" should normally be "on"). These typos should be corrected.
Other than the above the paper reports very interesting findings, so once the aforementioned comments are addressed the paper may proceed to publication.

Review 3 by anonymous reviewer

This paper summarizes the contributions and the lessons learned during the development of CultureSampo. It is well written and gives an overview of the whole system, the content creation and integration, the user interfaces, and the system¢s functionality exposed as cultural heritage semantic services.
It would be interesting to give more information on how the CultureSampo functionality was used in the BookSampo project (unless BookSampo is an ongoing project – this is not very clear from reading the text) or, in general, give more details on how the CultureSampo services/functionalities can be reused in other use cases (e.g. the steps/requirements needed).