Linked European Television Heritage

Paper Title: 
Europe’s Television Heritage
Authors: 
Nikolaos Simou, Nasos Drosopoulos and Vasillis Tzouvaras
Abstract: 
The EUscreen project represents the European television archives and acts as a domain aggregator for Europeana, Europe’s digital library. The main motivation for its creation was to provide unified access to a representative collection of television programs, secondary sources and articles, and in this way to allow students, scholars and the general public to study the history of television in its wider context. In this paper, we present the methodology followed for publishing the EUscreen dataset as Linked Open Data.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Decision/Status: 
Major Revision
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

This is a revision after a "reject and resubmit", now "conditionally accepted with major revisions". The original submission was entitled "Europe’s Television Heritage", and its reviews can be found beneath the second round reviews.

Solicited review by Aidan Hogan:

Thanks to the authors for the revision and the response letter.

My main concerns with the paper related to presentation issues, as well as issues with the dataset itself. Both concerns have been partly but not fully addressed. A notable improvement is that the dereferenceability of URIs has been fixed: the dataset could now be considered as Linked Data.

In terms of presentation issues, the authors have provided some examples of RDF data produced by the process which helps get an idea of the dataset. They have also addressed various other issues raised by myself and other reviewers. Still, however, the paper suffers from presentational issues:

* There are still about 25/30 typos. (Some of these could be fixed with an automatic spell-checker.)

* Figure 2 needs to resized.

* With the exception of the itemized URIs in Section 4.1, I think all URLs should be given as footnotes to prevent interrupting the text so frequently.

* The spacing (both horizontal with respect to justification and vertical with respect to spacing around figures, etc.) is a bit ugly. Perhaps this can be improved for the camera-ready version?

* The references are underlined, ugly and difficult to read.

All of that said, the paper is understandable and reads okay. Problems are more minor issues wrt. sloppiness rather than a lack of legibility. Please pay more attention to these issues!

My main concerns still lie with quality of the dataset. As I say, URIs now seem to dereference correctly to RDF, which is good. However, other issues are only partly addressed and I'm thus still concerned about the usability of the data:

* The overuse of literals is still present. Granted the authors mention this as a weakness of the data, but I don't buy their excuse for having literals like this. The problem that the authors encountered, as I understand it, is with extracting unique and unambiguous URIs for literals. But, for example, I see literals like "Stereo" for ebu:hasAudioFormat and "Colour" for ebu:hasVideoFormat. I don't see what the problem would be here: there's presumably a controlled set of literal values for these attributes, and converting them to URIs using the labels as suffixes would seem simple enough. In fact, the ontology *requires* that many of these properties are given URI values (discussed later) so it is erroneous to have literals in these positions.

* Relatedly, for me, sticking with the EBUCore ontology is really dragging down the quality of the dataset. The usability of the dataset is from the perspective of the consumer. As a generic Linked Data consumer, the EBUCore properties and classes used in the data currently mean nothing: (i) they cannot be dereferenced and (ii) they are not related to popular terms elsewhere.

~ First, the URIs are not dereferenceable. In their response letter, the authors acknowledge this problem and state that they have contacted the maintainers. However, as it stands at the moment (which is all I can evaluate the dataset on), the lack of dereferenceability means that the semantics of the class and property terms are lost: in a Linked Data setting, they're just URI strings.

~ Second, little re-use of *extremely* common existing terms is present. Thus, a Linked Data consumer that understands popular terms like "rdfs:label" as a name for something, or "dct:creator" as the person who created something, or "foaf:thumbnail" as a small image for something that can be displayed to users, can do very little with the current data. After dereferenceable URIs, a core tenet of Linked Data is to stop people from creating yet another "Document" or "Agent" class or yet another "title" or "name" or "latitude" or "longitude" property. In Linked Data, re-inventing such common terms *again* (and again and again…) is a capital offence.

To me, it seems that EBUCore was designed as a "traditional" ontology: to be self-contained (it defines a lot of terms already made available in FOAF, DC, DCTERMS, GEO, etc.), to be loaded manually (it's not dereferenceable), etc. Unless the ontology is improved (made dereferenceable, linked to legacy terms, etc.), it's not suitable for a high-quality Linked Data export. In terms of possible remedies, in order of preference:

* Use the W3C Recommended Media Resources 1.0 vocabulary directly. It at least dereferences and seems to cover what you need. (Unfortunately it also reinvents several common terms, but manual mappings are described in the specification.)

* Get the EBUCore ontology to dereference. Ideally get terms linked to their legacy counterparts. If not, use legacy terms directly in the data as opposed to EBUCore ontology (e.g., dct:creator). If you still need the EBUCore properties in there, provide redundancy with both legacy and EBUCore properties.

* At the very least, discuss the weaknesses of the EBUCore ontology as weaknesses of your dataset.

In any case, much of your RDF export is not compatible with EBUCore (apologies for not noticing this in the previous review). Looking at the example data from:

http://lod.euscreen.eu/data/EUS_55F569268ACA42B186682960875F862B.rdf

I find the following issues (probably not a complete list):

* The following properties are defined as ObjectProperty in the ontology, but given literal values in the data (relating again to the issue of overuse of literals):
~ hasSubject
~ hasKeyword
~ hasObjectType
~ hasFormat
~ hasGenre
~ hasLanguage
~ hasVideoFormat
~ hasPublicationChannel (aside: if a value is not given, omit the attribute ... don't just give it a blank literal)

* The following properties have a defined range incompatible with how they are used in the data:
~ locator (range anyURI, given plain literal)
~ identifier (range anyURI, given plain literal)

* The property ebucore:topic is used but not defined

The authors still have work to do on their dataset, and (to a lesser extent) on their presentation. Again, the dataset is interesting and the direction is encouraging, but *more attention to detail is needed* for both the paper and the data.

Solicited review by Michael Hausenblas:

The authors have addressed all the issues I've raised and made the paper much more readable, it is ready for publication now.

Solicited review by Emanuele Della Valle:

The paper has largely improved in content and form. I recommend the authors to address the following minor issues before accepting it:
- links in-line in the text may be replaced by footnote. For instance, instead of writing "European (http://www.europeana.eu/)" the authors may add a footnote of the form "See http://www.europeana.eu/ September 18, 2012."
- on the left column of page two, the authors refer to a project survey and some reports. They are not available on the project website (http://euscreen.eu/). The authors should either avoid referring to them or should make them available and add appropriate references.
- Figure 1 is little informative. The authors may instead show a high level representation of EBU Core (e.g., figure 2 in http://tech.ebu.ch/docs/tech/tech3293v1_3.pdf)
- a direct link to EBU Core (http://tech.ebu.ch/lang/en/MetadataEbuCore), EBU Core ontology (http://www.ebu.ch/metadata/ontologies/ebucore/) and MAWG (http://www.w3.org/2008/WebVideo/Annotations/) should be added; the readers should not have to dig them out of the Web on their own.
- in Figure 2, baseURI should be replaced by lod.euscreen.eu
- the link to the google doc should be replaced by a reference to a project deliverable (see also the first comment in this list)
- how many links to DBpedia were added? 1365 (as written in the left column of page 5) or 1490 (as written in Table 1)?
- the sparql endpoint may be placed under lod.euscreen.eu
- the example SPARQL query may be replaced by the following federated query that shows the value of the linking
PREFIX ebu:
PREFIX dbp:
PREFIX dbp-onto:
SELECT ?video ?actor
WHERE {
SERVICE
{ dbp:James_Bond dbp-onto:portrayer ?actor }
SERVICE
{ ?video ebu:mentionedPersonInSummary ?actor }
}
- in the conclusion section, the following three statements appear weak; consider rethinking them:
- "However having in mind that the type of content served by EUscreen is European television programmes, we can say that its size is significant." -- why?
- "The reason why we preferred to keep the original values from metadata creating literals was because we did not want to destroy or lose this information." -- why? most of the values can be captured by ontological instances. For instance "Stero" --> ":stero", "Colour" --> ":colour", etc.
- "we intend to upgrade the existing triplestore with one that supports federated SPARQL queries" -- why?

First-round reviews:

Solicited review by Aidan Hogan:

This paper presents ongoing efforts to expose metadata about European television heritage as Linked Data. The metadata describe various programmes selected as being relevant to significant 20th Century European historical events. As part of the EUscreen project, content providers select relevant programmes and upload them along with metadata offering information about title, series title, language, genre, subject, etc. Metadata is uploaded to the "MINT" service in (standard) XML or CSV format, which allows for various editing and transformation steps. The current paper proposes taking these metadata in XML format and converting them to RDF using the EBUCore ontology to model the output. A Linked Data platform is then built using dereferenceable naming schemes. Countries mentioned in the metadata have been linked to DBpedia. In total, 22,190 programme resources are currently described, with a total of 114,142 including related resources. 4store is used to host the data.

The data sound interesting to have available online as Linked Data, esp. if integrated with Europeana. In general, the description is fairly well written, though some parts of the text could do with a more thorough proof-red. It does an adequate job of giving the reader an impression of what data is being exported and how. That said, the description does have some significant shortcomings that should be addressed:

* No overview of the model/vocabulary/ontology/schema is provided. The linked Google spreadsheet does not help to get an overall picture of the data being captured. A diagram showing the key classes, properties and their inter-relation would help a lot.

* Futhermore, instead of describing the resulting RDF data in prose, it would be better to give some example(s) of instance data created by the process and shorten the current text.

* I would like to see argumentation as to why making these metadata available as RDF is useful/important? What can people do with the data? Can they be combined with other datasets in a non-trivial way? If so, which datasets (DBpedia countries is not very convincing)? Can new questions be asked against it using SPARQL? This should be argued in the paper.

Without these details, the description falls short of communicating what kind of data is being exported and also falls short of arguing why the dataset is interesting for Linked Data consumers. These are important aspects of the evaluation of the submissions for this Special Issue.

The other important part of evaluating submissions is, of course, the dataset itself. I did manage to find the RDF linked in the paper online. However, I did find certain shortcomings in how it is published:

* The resource URIs do not seem to dereference correctly to their RDF/XML descriptions. This is of course the key aspect of Linked Data. If I look up:
- http://lod.euscreen.eu/resource/EUS_55F569268ACA42B186682960875F862B
(taken from the paper) with Accept: application/rdf+xml, I get a 303 redirect to:
- http://lod.euscreen.eu/data/EUS_55F5692.rdf
However, this URI gives a 404.

* The EBUcore vocabulary used also does not dereference. For example, if I look up
- http://www.ebu.ch/metadata/ontologies/ebucore#hasAffiliation
looking for RDF/XML, I get a 301 to a directory containing the relevant OWL description. This is no good for a software agent.

* I did find the HTML example and the RDF/XML example data at
- http://www.euscreen.eu/play.jsp?id=EUS_55F569268ACA42B186682960875F862B
- http://lod.euscreen.eu/data/EUS_55F569268ACA42B186682960875F862B
respectively. Having a ".rdf" extension on the latter would be welcome. Otherwise, I do have some comments about the data I found in the RDF file.

~ First, although there is some re-use of legacy terms for describing documents, there is the potential for a lot more re-use of existing vocabularies. One option is to use the external term directly. Another is to map to the external vocabularies from the EBUCore ontology. Some suggestions for re-use:
# ebucore:name -> foaf:name, rdfs:label, ...
# ebucore:summary -> rdfs:comment, dc[t]:description, ...
# ebucore:hasSubject/ebucore:topic -> dc:subject, ... preferable to use SKOS scheme
# ebucore:alternativeTitle -> skos:altLabel
# ebucore:dateCreated -> dcterms:created
# ebucore:rights -> better to try reuse cc: vocabulary and licence URIs where possible?
# ebucore:genre -> po:genre
...and so on. Also for linking, SKOS offers skos:exactMatch, skos:narrowMatch, skos:broadMatch and skos:relatedMatch. In general, the authors should look to either re-use or map to equivalent terms in DC(TERMS), RDFS, FOAF, SIOC, Music Ontology, Programme Ontology, SKOS, etc. (Note that re-use of vocabularies is a key feature of Linked Data was one of the criteria mentioned in the evaluation of the submissions for this Special Issue.) Alternatively, the authors can look to re-use the "Ontology for Media Resources 1.0" as mentioned.

~ Second, there seems to be an overuse of literals. For example, formats like "Video" should be given a URI (possibly even a class), same for genres like "Factual", same for topics and subjects which should probably use SKOS, same for licences (though I note ebucore:rights does use URIs). Ideally keywords could also be given URIs (if they can be successfully disambiguated).

Given the shortcomings in the description (esp. no overview of model or examples of data, no arguments as to why these data are good to have exposed as Linked Data) and the dataset (esp. problems with dereferenceability and lack of re-use of vocabularies), I cannot recommend an accept at this time.

Solicited review by Michael Hausenblas:

Overall the paper is a valid contribution and on-topic but has some presentation issues that should be addressed before it gets accepted.

The authors describe the publishing of the European television archives dataset through the EUscreen project at http://lod.euscreen.eu/ and provide insights into design decisions in the process. The dataset is relevant and of high quality, the potential usefulness is given (although could be extended beyond one use case). The dataset description seems complete but lacks clarity.

## Core DSD
Core questions concerning the DSD including licensing and availability are listed.

## Publishing and metrics
The authors clearly described the coverage and provided relevant metrics as well as discussed the access methods in Section 4. It appears to me that the authors performed a manual interlinking task ("the names of the local dataset countries were compared using SPARQL [7] to names of the countries resources served by DBpedia." in Section 4.2) - it would be good to highlight why this has been done and if semi-automatic approaches such as Silk or Limes could be useful.

## Examples, modeling patterns and shortcomings
Examples are provided (though one representative, complete example in RDF/Turtle syntax or as a graph figure might be beneficial to include) and the modeling process including the design decisions is present. I did not find a proper discussion about shortcomings of the dataset, though.

## What is missing
Besides a 'related work' section the authors have covered the relevant parts, content-wise. The main issue I have is with the presentation (see below).

## Editorial comments
Although the use of English is not too bad, the paper would benefit from another round of proof-read, ideally from a native speaker. In addition the article is somewhat wordy - I will provide concrete suggestion what could be cut down in the following.

Presentation:

Section 1 and Section 2 provide the background and should be dramatically shortened into one section. For example, the entire history (around EBUcore to MPEG7) can be removed as not directly relevant. Then, in Section 2 there is IMO no need to go into the details of the EUscreen project consortium and goals. Simply describe the topics in one paragraph (listing at the end of the section is core, I think).

Section 3 contains a number of not relevant descriptions, can be cut down, for example the entire paragraph "Registered users can start by uploading their metadata records in XML or CSV serialization, using the HTTP, FTP and OAI-PMH protocols. Users can also directly upload and validate records in a range of supported metadata standards (XSD). XML records are stored and indexed for statistics, previews, access from the mapping tool and subsequent services. Handling of metadata records includes indexing, retrieval, update and transformation of XML files and records. XML processors are used for validation and transformation tasks as well as for the visualization of XML and XSLT." can be stated in one short sentence.

In Section 4.1, the sentence "The complete set of properties and classes used for the mapping of all the harvesting schema's elements can be found at https://docs.google.com/spreadsheet/ccc?key=0Akru w5a0_oaLdEQyMl85NVQxZ2lmT00wcVU4ZVRJZ 0E&hl=en_US#gid=3" is sort of poor in terms of presentation - can this be made available via a nicer location and in a better digestible format?

In Section 4.2, I suggest to remove "External RDF links are crucial for the Web of Data as they are the glue that connects data islands into a global, interconnected data space [5]." as it is a generic statement and doesn't add anything here.

In Section 4.3, I suggest to turn the paragraph "At the moment the pilot holds 22.190 programme resources while the total amount of resources is 114.142. Among the total resources, 13.158 are made for persons individuals referring to the contributor of the programme while 582 are made for countries - linked to 1439 externals- and 22 for languages – linked to 63 externals. In addition by using spotlight, 1490 person resources are extracted to which links are made from 1133 programmes' English summaries." into a table that makes it easier to understand.

The Section 5 is again quite wordy and also introduces new facts: "In particular in total 2855 person resources were extracted and 1365 of them were wrong (manually filtered), despite the fact that the confidence value in the spotlight setup was set high." - I suggest to move this into Section 4.

Typos:

* Section 4.1: "that states the use of URIS for things" -> "that states the use of URIs for things"
* Section 4.1: "domain administered by the project (lod.euscreeen.eu)" -> "domain administered by the project (lod.euscreen.eu)"
* Section 4.2: "(info from google anytics)." -> "(info from Google Analytics)."

Solicited review by Emanuele Della Valle:

The paper presents in a well-written and correctly structured format an important dataset of the European Commission. The dataset is rather small, but it is externally connected to DBpedia, and Geonames. The vocabulary is presented at a level of details that allows readers to issue SPARQL queries against the dataset. An example of SPARQL query that bridges DBpedia is illustrated.

I only have minor comments:
- General
- is a VoID description of the dataset available?
- can the authors elaborate a bit more on the licensing? Why not using a Open Data Commons license?
- Page 1 column 2
- The RDF version of the dataset will eventually be hosted by the EU, i.e. the actual dataset owners, themselves, which ensures a long time availability of the data. -> The RDF version of the dataset will eventually be hosted by the EU, i.e. the actual dataset owner, itself, which ensures a long time availability of the data.
- Page 2 columns 2
- the IRI "fts-o:cofinancingRate" runs into the margin
- Page 3
- Figure 1 is difficult to read. The authors may consider redrawing it by hands. To save space they may want to remove instances.
- Page 3 column 1
- & -> and
- Page 3 column 2
- copmile -> compile
- consider adding a link to JAXB
- Page 4 column 1
- in table 1 the word "Commitments" touches 28114

Tags: 

Comments

We would like to thank all the reviewers for their constructive comments which have greatly assisted us in improving our manuscript.
In particular, our reply and action taken w.r.t. each comment is described below.

REVIEWER 1

COMMENT: No overview of the model/vocabulary/ontology/schema is provided. The linked Google spreadsheet does not help to get an overall picture of the data being captured. A diagram showing the key classes, properties and their inter-relation would help a lot.
RESPONSE: We have added a diagram in Section 4.1 that illustrates a programme's metadata in RDF, also presenting the main classes used and their inter-relation.

COMMENT: Futhermore, instead of describing the resulting RDF data in prose, it would be better to give some example(s) of instance data created by the process and shorten the current text.
RESPONSE: An RDF example from the EUscreen dataset is shown in Figure 2.

COMMENT: I would like to see argumentation as to why making these metadata available as RDF is useful/important? What can people do with the data? Can they be combined with other datasets in a non-trivial way? If so, which datasets (DBpedia countries is not very convincing)? Can new questions be asked against it using SPARQL? This should be argued in the paper.
RESPONSE:We have extended the existing discussion on the dataset in Section 5 by including a SPARQL example and a paragraph presenting candidates to whom this dataset is useful and how it can be linked to other datasets.

COMMENT: The resource URIs do not seem to dereference correctly to their RDF/XML descriptions. This is of course the key aspect of Linked Data. If I look up:
- http://lod.euscreen.eu/resource/EUS_55F569268ACA42B186682960875F862B (taken from the paper) with Accept: application/rdf+xml, I get a 303 redirect to:
- http://lod.euscreen.eu/data/EUS_55F5692.rdf However, this URI gives a 404.
RESPONSE: We could not reproduce this error (we believe that something might have broken for a while during an update).We have examined our content negotiation implementation by using Linked Data Validator - Vapour (please click on the
following link to verify the validity of our implementation) http://validator.linkeddata.org/vapour?uri=http%3A%2F%2Flod.euscreen.eu%...

COMMENT: The EBUcore vocabulary used also does not dereference. For example, if I look up
- http://www.ebu.ch/metadata/ontologies/ebucore#hasAffiliation looking for RDF/XML, I get a 301 to a directory containing the
relevant OWL description. This is no good for a software agent.
RESPONSE: We are aware of that issue and we have mentioned it to the maintainer of the EBUcore ontology.

COMMENT: I did find the HTML example and the RDF/XML example data at
- http://www.euscreen.eu/play.jsp?id=EUS_55F569268ACA42B186682960875F862B
- http://lod.euscreen.eu/data/EUS_55F569268ACA42B186682960875F862B
respectively. Having a ".rdf" extension on the latter would be welcome. Otherwise, I do have some comments about the data I found
in the RDF file.
RESPONSE: We have added the rdf extension.

COMMENT: ~ First, although there is some re-use of legacy terms for describing documents, there is the potential for a lot more re-use of existing vocabularies. One option is to use the external term directly. Another is to map to the external vocabularies from the EBUCore ontology. Some suggestions for re-use:
# ebucore:name -> foaf:name, rdfs:label, ...
# ebucore:summary -> rdfs:comment, dc[t]:description, ...
# ebucore:hasSubject/ebucore:topic -> dc:subject, ... preferable to use SKOS scheme
# ebucore:alternativeTitle -> skos:altLabel
# ebucore:dateCreated -> dcterms:created
# ebucore:rights -> better to try reuse cc: vocabulary and licence URIs where possible?
# ebucore:genre -> po:genre
...and so on. Also for linking, SKOS offers skos:exactMatch, skos:narrowMatch, skos:broadMatch and skos:relatedMatch. In general,
the authors should look to either re-use or map to equivalent terms in DC(TERMS), RDFS, FOAF, SIOC, Music Ontology, Programme Ontology, SKOS, etc. (Note that re-use of vocabularies is a key feature of Linked Data was one of the criteria mentioned in the evaluation of the submissions for this Special Issue.) Alternatively, the authors can look to re-use the "Ontology for Media Resources 1.0" as mentioned.
RESPONSE: Again this is a design issue related to the EBUcore ontology that we do not maintain. The EBUCore ontology was selected as the most appropriate vocabulary for the representation of the EUscreen content in RDF by audiovisual experts.

COMMENT: Second, there seems to be an overuse of literals. For example, formats like "Video" should be given a URI (possibly even a class), same for genres like "Factual", same for topics and subjects which should probably use SKOS, same for licences (though I note ebucore:rights does use URIs). Ideally keywords could also be given URIs (if they can be successfully disambiguated).
RESPONSE: We have added a paragraph in Section 5 commenting on the reasons that led us to this decision.

REVIEWER 2

COMMENT:The authors clearly described the coverage and provided relevant metrics as well as discussed the access methods in Section 4. It appears to me that the authors performed a manual interlinking task ("the names of the local dataset countries were compared using SPARQL [7] to names of the countries resources served by DBpedia." in Section 4.2) - it would be good to highlight why this has been done and if semi-automatic approaches such as Silk or Limes could be useful.
RESPONSE: We have added explanation about this selection in Section 4.2.

COMMENT: Examples are provided (though one representative, complete example in RDF/Turtle syntax or as a graph figure might be beneficial to include) and the modeling process including the design decisions is present. I did not find a proper discussion about shortcomings of the dataset, though.
RESPONSE: We have added a diagram in Section 4.1 that illustrates a programme's metadata in RDF, also presenting the main classes used and their inter-relation.

COMMENT: Besides a 'related work' section the authors have covered the relevant parts, content-wise. The main issue I have is with the presentation (see below).
RESPONSE:We have included related work in Section 1.

COMMENT: Section 1 and Section 2 provide the background and should be dramatically shortened into one section. For example, the entire history (around EBUcore to MPEG7) can be removed as not directly relevant. Then, in Section 2 there is IMO no need to go into the details of the EUscreen project consortium and goals. Simply describe the topics in one paragraph (listing at the end of the section is core, I think).
RESPONSE: We have shortened this Section as indicated and its title changed to EUscreen content.

COMMENT: Section 3 contains a number of not relevant descriptions, can be cut down, for example the entire paragraph "Registered users can start by uploading their metadata records in XML or CSV serialization, using the HTTP, FTP and OAI-PMH protocols. Users can also directly upload and validate records in a range of supported metadata standards (XSD). XML records are stored and indexed for statistics, previews, access from the mapping tool and subsequent services. Handling of metadata records includes indexing, retrieval, update and transformation of XML files and records. XML processors are used for validation and transformation tasks as well as for the visualization of XML and XSLT." can be stated in one short sentence.
RESPONSE: We have shortened this Section as indicated.

COMMENT: In Section 4.1, the sentence "The complete set of properties and classes used for the mapping of all the harvesting schema's elements can be found at https://docs.google.com/spreadsheet/ccc?key=0Akruw5a0_oaLdEQyMl85NVQxZ2l... " is sort of poor in terms of presentation - can this be made available via a nicer location and in a better digestible format?
RESPONSE:We have included a figure (2) illustrating an excerpt of a programme in RDF (due to space limitation), a URI with the complete example and we have also left the Google doc as a reference for the complete mapping from XML to RDF.

COMMENT: In Section 4.2, I suggest to remove "External RDF links are crucial for the Web of Data as they are the glue that connects data islands into a global, interconnected data space [5]." as it is a generic statement and doesn't add anything here.
RESPONSE:We have removed the sentence.

COMMENT: In Section 4.3, I suggest to turn the paragraph "At the moment the pilot holds 22.190 programme resources while the total amount of resources is 114.142. Among the total resources, 13.158 are made for persons individuals referring to the contributor of the programme while 582 are made for countries - linked to 1439 externals- and 22 for languages – linked to 63 externals. In addition by using spotlight, 1490 person resources are extracted to which links are made from 1133 programmes' English summaries." into a table that makes it easier to understand.
RESPONSE: We have replaced the paragraph with a table as indicated.

COMMENT: The Section 5 is again quite wordy and also introduces new facts: "In particular in total 2855 person resources were extracted and 1365 of them were wrong (manually filtered), despite the fact that the confidence value in the spotlight setup was set high." - I suggest to move this into Section 4.
RESPONSE: We have moved this information to Section 4.2.

REVIEWER 3

COMMENT: The paper presents in a well-written and correctly structured format an important dataset of the European Commission. The dataset is rather small, but it is externally connected to DBpedia, and Geonames. The vocabulary is presented at a level of details that allows readers to issue SPARQL queries against the dataset. An example of SPARQL query that bridges DBpedia is illustrated.
RESPONSE: We are not sure if this review corresponds to our manuscript.

COMMENT: I only have minor comments:
- General
- is a VoID description of the dataset available?
- can the authors elaborate a bit more on the licensing? Why not
using a Open Data Commons license?
RESPONSE: There is a paragraph in Section 4.1 that discusses the provenance metadata included, as well as the license used for the publication of the data.

COMMENT: - Page 1 column 2
- The RDF version of the dataset will eventually be hosted by the EU, i.e. the actual dataset owners, themselves, which ensures a long time availability of the data. -> The RDF version of the dataset will eventually be hosted by the EU, i.e. the actual dataset owner, itself, which ensures a long time availability of the data.
- Page 2 columns 2
- the IRI "fts-o:cofinancingRate" runs into the margin
- Page 3
- Figure 1 is difficult to read. The authors may consider redrawing it by hands. To save space they may want to remove instances.
- Page 3 column 1
- & -> and
- Page 3 column 2
- copmile -> compile
- consider adding a link to JAXB
- Page 4 column 1
- in table 1 the word "Commitments" touches 28114
RESPONSE: The above comments do not correspond to our manuscript.