CEDAR: The Dutch Historical Censuses as Linked Open Data

Tracking #: 1234-2446

Albert Meroño-Peñuela
Ashkan Ashkpour
Christophe Guéret
Stefan Schlobach

Responsible editor: 
Pascal Hitzler

Submission type: 
Dataset Description
In this document we describe the CEDAR dataset, a five-star Linked Open Data representation of the Dutch historical censuses, conducted in the Netherlands once every 10 years from 1795 to 1971. We produce a linked dataset from a digitized sample of 2,288 tables. The dataset contains more than 6.8 million statistical observations about the demography, labour and housing of the Dutch society in the 18th, 19th and 20th centuries. The dataset is modeled using the RDF Data Cube vocabulary for multidimensional data, uses Open Annotation to express rules of data harmonization, and keeps track of the provenance of every single data point and its transformations using PROV. We link these observations to well known standard classification systems in social history, such as the Historical International Standard Classification of Occupations (HISCO) and the Amsterdamse Code (AC), which in turn link to DBpedia and GeoNames. The two main contributions of the dataset are the improvement of data integration and access for historical research, and the emergence of new historical data hubs, like classifications of historical religions and historical house types, in the Linked Open Data cloud.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Eetu Mäkelä submitted on 07/Dec/2015
Review Comment:

The authors have implemented most, but not all of my requests.

Still missing are the expanded handling of Table 1 (what do the dimension resources referenced there look like), as well as pruning of the example queries on the dataset page (all examples of 'Other queries' still return empty result sets).

Corrections also appear more as minimal patches instead of proper reworkings, which still hinders understanding. Specifically problematic are the introduction of cedar-mini and the handling of mapping progress, the proper handling of both of which would require attention in the whole structure of the paper.

However, the paper is still readable also at present, and the reworking of the discussion part is a marked improvement, so these reservations aside, I am still ready to accept the paper.