The Rijksmuseum Collection as Linked Data

Tracking #: 864-2074

Authors: 
Chris Dijkshoorn
Wesley ter Weele
Lizzy Jongma
Lora Aroyo

Responsible editor: 
Harith Alani

Submission type: 
Dataset Description
Abstract: 
Many museums make their collections accessible online. To reuse concepts and ease the integration of collections it is beneficial for institution to release their datasets as Linked Data. In this paper we present the Linked Data version of the Rijksmuseum, accessible at http://sealinc.ops.few.vu.nl/rijksmuseum/. We describe and provide statistics about the collection, the links to structured vocabularies and the links to other collections. The data presented in this paper is used in multiple ways: to enable aggregation, by users to explore the collection and by scholars to use structured queries to answer complex research questions.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 12/Nov/2014
Suggestion:
Major Revision
Review Comment:

The paper describes the Linked Data version of the Rijksmuseum Collection with the use cases of data aggregation and querying for solving complex research questions.
The paper fits well in the Linked Data Descriptions category of the journal.

Section 1 provides an introduction to the topic in very generic terms without much substance content, e.g., regarding the dataset or related works. Section 2 describes the digitization project underway in the museum. Also this chapter does not contain much content regarding the actual case and should be made more concise.
Section 3 describes the dataset model and statistics about it. The model in use is EDM and is not described in detail but rather by giving a reference. I think more details would have been useful here, e.g., whether full EDM was used or whether additional properties were introduced. It would have been nice to know also, how well the EDM model fit the use case or other lessons learned in applying it, and why and how “handles” and purl.org URIs were used and experiences of using them.

Figure 1 illustrates the data model and related vocabularies. This is useful, but the figure should be explained in the text, which would be helpful to the reader.
The data set is “linked” because it makes use of the structured vocabularies AAT and Iconclass in RDF form. The links to e.g. other collections and data sets are therefore indirect via these thesauri, and only if they are used also in the other collections. Collection items do not e.g. have direct links the same objects described in DBpedia. It is not clear if links to related pictures etc. can be aggregated in Europeana using EDM. Linking (for getting the fifth star) is therefore not very “rich” but anyway useful.
Figure 2 illustrates frequencies of concepts used in annotations. They could be discussed in more detail. Does 2a mean that only 4-5- AAT concepts are actually used? Then linking to AAT is trivial.

After this in Section 5, applications of the dataset are described. Supporting multilingual access has been a driving force behind the project, as well as the desire to establishing compatibility with Europeana. It is not explained how well the data finally fits with Europeana – discussions on data quality would be important in this category of papers. Multilingual access and data compatibility are important aspects of usefulness but not really “applications”, as the section title suggests. Next, “curser search” [11] is described shortly as an application. I tried the demo, it worked, but the claim that this “provides an ideal basis for users to explore the collection” is not substantiated without more explanations. It is not easy to see the benefits by looking at the demo interface clusters. Finally, using the RDF base as a research artifact to answering a research question about themes in bibles is discussed. It turns out that an additional dataset is needed for this, mapping collection objects to objects in a bibliographical dataset. This is described in Section 6.

The last section Discussion summarizes the work opening some avenues for further development.

This paper presents an important and extensive linked dataset. The general approach seems quite appropriate. The paper does not present novel scientific results, which is acceptable in this category of data description papers. Instead, authors are advised to focus on 1) evidence of data quality, 2) usefulness, and 3) clarity/completeness of the descriptions. In my mind the presentation still needs more rigor and major revisions as explained above. Especially, the dataset should be described in more detail, reasons for the design choices made there should be justified/explained explicitly, and lessons learned discussed. A footnote (14) web address is given to a “description” of the data but there is no detailed documentation about the data. Thse revisions require more space but that can be obtained by e.g. shortening sections 1-2, perhaps explaining concept usage in Fig. 2 only verbally, and leaving out or shortening several non-informative or speculative or “museum political” paragraphs now present in the text, such as the last two paragraphs in Discussion. Also some concrete evidence and discussion about the linked data quality is also needed, in addition to explaining the museum's process and goal for high quality.

Review #2
Anonymous submitted on 05/Dec/2014
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'Data Description' and should be reviewed along the following dimensions: (1) Quality of the dataset.
The dataset is taken from the Rijksmuseum collection and has high quality. To get a better understanding of the data it will be helpful to have a description abut the methods that have been used to create the Linked Data version of the museum.

(2) Usefulness (or potential usefulness) of the dataset.
The linked data described in the paper is valuable for many researchers who are working in the field of Semantic Web and also for researchers working with language technology. It is valuable for a variety of use cases.

(3) Clarity and completeness of the descriptions.
The authors provide a detailed description about linking the data to structured vocabularies but there is no description of the model(s) that have been used to present the data as linked data. What are the challenges involved in presenting the museum data collection that are modelled according to EDM as linked data? Are DC and SKOS sufficient models for this presentation? I would like to know more about the insight and the lessons learned from this work.

In section 4 you write "every thesaurus is limited by its scope ..." Do you mean scope in terms of concepts or in terms of coverage? please elaborate.

By "hidden relations" (page 4) do you mean inferred relations.

In the discussion section the authors mention that CIDOC-CRM will be considered. The CRM was not mentioned before and it is unclear how exactly the authors plan to use it. Will the model
fill some gaps, what are these gaps?

How does this work compare or improve upon previous work?

Other minor corrections:
Abstract:
> "integration of collections it is beneficial for institution" -> "integration of collections it is beneficial for institutions"

2. The Rijksmuseum collection in a digital age
> "has 8,000 object on display" -> "has 8,000 objects on display"

4. Links to structured vocabularies
> "Iconclass but are not a concepts" -> "Iconclass but are not concepts"
> "Examples are 61E(AMSTERDAM) and 61E(ITALY)" -> Is "61E" the concept name?

Europeana is first explained on page 4, it should be explained earlier in the paper, for example already in the introduction or in section 3.

Review #3
By Mariana Damova submitted on 10/Dec/2014
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'Data Description' and should be reviewed along the following dimensions: (1) Quality of the dataset. (2) Usefulness (or potential usefulness) of the dataset. (3) Clarity and completeness of the descriptions.

====

The paper presents the digitized collection of Rijksmuseum that has been consequently converted into Linked Data of 5 stars. Cultural heritage is one of the domains where Semantic Web Technologies and Linked Data are expected to bring substantial benefits, so this dataset is important, because it pioneers and paves the way to publishing museum content as Linked data. The paper includes a Section “Applications”, where it outlines several cases of usage of the Linked Data dataset. Having adopted Europeana Data Model (EDM) as a basic underlying schema for describing the data, naturally exposes the dataset to the entire Europeana ecosystem of 50 million cultural objects from all over Europe, the tools and best practices to deal with them in a creative way, abiding to Europeana principles of open and license-free access to meta-data for re-use.
However, the paper is submitted as a dataset description, but it reads as a history of the creation of the dataset, in which the emphasis is put on the original data and their value rather than on the linked data dataset. For instance, Section 2 describes the Rijksmuseum collection and the digitalization project, and Section 3 describes the data model and the dataset statistics, but there is no explicit statement about the connection between these two sections. Moreover, while Fig.1 shows how the EDM model is applied to Rijksmuseum data, the paper does not discuss and present what properties describe a single museum object. E.g. it would be interesting for the reader to learn how many triples describe a single object, and how does the overall data model look like. Further, it is mandatory that the authors publish the URI and the namespace of the dataset and provide open access to it, or explain why it is not provided. The links to Art and Architecture Thesaurus and the Iconclass vocabulary should be explained, not only quantified, so that the reader understands what is the advantage of this linking.
While showcasing the use of the presented dataset in section 5 “Applications” is good, it seems to me that a lot of this section can be presented as motivation for the need of creating this 5-start Linked Data dataset. Section 6 also describes links to an external dataset. It is not clear why this dataset is separated in a new section, and how it relates to the linking, described in section 4.
I would recommend moving part of section 5 in the beginning of the paper as motivation for the creation of the dataset, reducing the size of the history around the Rijksmuseum digitization project, and focusing with more explicit statements on the advantage of the Linked data dataset and on explaining its structure and characteristics. It would be easier for the reader if all dataset statistics and figures are presented in a table. Currently, they appear in several places in the texts, and it is difficult to have an idea about their overall picture. The “Applications” section on the other hand could show how the linked data dataset allows to obtain information that would be otherwise inaccessible.
Overall, the dataset is valuable and important, but the paper in its present version does not describe it as a dataset, but puts more emphasis on the process of creating it, and on the context surrounding this creation.
I think the work deserves to be published, but the paper needs to be reworked in view of the category of submission it applies to be published under and to convey intrinsic logic in the outline of the sections.