The debates of the European Parliament as Linked Open Data

Tracking #: 1229-2441

Astrid van Aggelen
Laura Hollink
Max Kemman
Martijn Kleppe
Henri Beunders

Responsible editor: 
Natasha Noy

Submission type: 
Dataset Description
The European Parliament represents the citizens of the member states of the European Union (EU). The accounts of its meetings and related documents are open data, promoting transparency and accountability, and are used as source data by researchers. However, the official portal of these documents provides limited search facilities. This paper presents LinkedEP, a Linked Open Data translation of the verbatim reports of the plenary meetings of the European Parliament. These data are integrated with a database of political affiliations of the Members of Parliament, and enriched with detected topics from the EU’s topic hierarchy and as well as links to three other Linked Open Datasets. The results of this work are available through a SPARQL endpoint as well as a user interface with extensive browse and search facilities. It is now possible to combine in one query information about the time and topic of the debate, the spoken words - in any available translation - and information about the speaker uttering these, such as affiliations to countries, parties and committees. This paper discusses the design and creation of the vocabulary, data and links, as well as known use of the data.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Konrad Höffner submitted on 11/Nov/2015
Minor Revision
Review Comment:

This review refers to the revision of the dataset description of "The debates of the European Parliament as Linked Open Data" following an earlier review.

Most of the requested changes have been made, which includes:

- removing the graphs of the search logs and homepage visits
- compressing the analysis of page visits
- adding specific use cases
- adding the version date
- adding the version number
- adding update plans

I request minor revisions, however, because of two shortcomings:

(1) The conversion process is described very briefly and should be expanded.
(2) All figures have low quality after printing and also on some zoom levels in a PDF viewer. Please make sure that all images are high quality vector images and not bitmaps.

Corrections: Please unify the capitalization of "LinkedPolitics" vs. "Linkedpolitics".

Review #2
By Alvaro Graves submitted on 16/Dec/2015
Minor Revision
Review Comment:

The authors describe how they converted the transcripts of the debates of the European Parliament into RDF. They provide details on different aspects of the conversion process, from the creation of URIs, vocabularies used and how it has been published. The authors also show some numbers showing how many times the data was queried and provide use cases based on people who used the data.

There are a couple of issue with this paper. First, it is not clear how the RDF representation of the data makes it easier for interested parties (mostly non-semantic web experts) to consume and take advantage of this data. Second, the statistics of use only show that the data has been queried, but showing "7.5 thousand times" doesn't mean much; it would be recommended to give some other measure to compare with. Based on the web interface available I would suggest to provide form for political scientists and other researchers to query the data that does not require knowledge of SPARQL.

It is also worth adding a few lines on how this dataset is going to be maintained in the future. Making the code available is something very valuable indeed. A few minor issues are also indicated below:

"The content and provenance of the data and vocabulary are described using the void, prov and omv vocabularies."


"The metadata are collected in a single graph on the server and as a turtle file in the well-known directory."

Citation to well-known

"over 5.5 thousand times and the dataset was queried through our service about 7.5 thousand times, of which 3,654 times"

Please be consistent how you present numbers. Also, write something like "more than 5 thousand", decimals look weird in that context

"Dataset quality One way to describe the quality of a Linked Dataset is the star system by Berners-Lee [2]. LinkedEP is a five-star collection"

Please remove that. The 5-star classification does not describe the quality of the data itself, only the format and eventually use of common vocabularies. People can still publish trash data using a 5-star scheme.

Review #3
By Adegboyega Ojo submitted on 21/Dec/2015
Review Comment:

In my first review, I pointed out three basic shortcomings of the work which I encouraged the authors to address for completeness. The comments include elaboration on specific patterns employed in the publishing process; information on stability, updates, and maintenance of the dataset and information on the shortcomings of the dataset. I also suggested providing concrete user stories for practical use of the datasets.
In the revised article, the authors provided information on the type of ontology pattern used (end of Section 4). They have also indicated the frequency of updates for the dataset in Section 5 and elaborated on the shortcomings of their work in Section 7. The suggestions on providing concrete user stories have also been addressed in Section 6 through the discussions on the two use case patterns 1 and 2. These use cases are more interesting than ones given in the first version of the article. Given that the authors have satisfactorily addressed the issues raised in the earlier manuscript, I recommend that the article is accepted and that it is publishable in the current form.