EventMedia: a LOD Dataset of Events Illustrated with Media

Paper Title: 
EventMedia: a LOD Dataset of Events Illustrated with Media
Houda Khrouf, Raphael Troncy
An ever increasing amount of event-centric generated knowledge is spread over multiple social services, either materialized as calendar of past and upcoming events or illustrated by cross-media items. This opens an opportunity to create an infrastructure unifying event-centered information derived from event directories, media platforms and social networks using the RDF data model. EventMedia aims at creating such an infrastructure that requires seamless aggregation and integration of disparate data sources, some of which overlap in their coverage. In this paper, we present the EventMedia knowledge base composed of events descriptions together with media descriptions associated with these events and interlinked with the larger Linked Open Data cloud.We describe how the data has been extracted, converted, interlinked and published following the best practices of the Semantic Web community.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Minor Revision

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Solicited review by Erik Wilde:

I like the paper in general, but the paper should spend a little bit more time explaining the most important design decisions. Right now, the paper very factually states what was done, it it does not really reflect very much why it was done that way, and what other possibilities were investigated. For readers, it would be very valuable to learn more about the design decisions at various stages (dataset selection, unified model selection, mapping strategies, use cases envisioned, ...), and to learn about the choices that the authors were making, and why they made them.

Language is ok, but the paper could need a final round of proofreading and editing.

Solicited review by Amit Joshi:

In this paper, authors have introduced an EventMedia dataset which contains media related event information in RDF format. Data is extracted from popular public event directories such as last.fm and UpComing, and access to data is provided through both REST API and SPARQL endpoint. The importance of such aggregated dataset is undoubtedly high.

The paper is well written with clear description of the dataset and a clear explanation of the approach taken for building such dataset. The importance of such dataset is further augmented by the real-time aggregation of posts shared on social network. Finding the overlap in meta-data across multiple directories through tag-based mapping is a neat implementation.

The RDF modeling includes sufficient details while the triple generation follows standard procedure. Authors have re-used existing ontologies (mediaOnt and SIOC) in addition to the main event ontology, LODE and have also used properties from FOAF, SIOC and Dublic core. Dataset has been further enriched through connection discovery on additional datasets such as Musicbrainz, DBpedia, Freebase and Uberblic. The site (http://eventmedia.eurecom.fr) was found to be properly working with sufficient details for most events.

I am pleased with the details provided in the paper. Few things to update:
1. I couldn't access the taxonomy description. http://data.linkedevents.org/category/ results in forbidden page.
2. Please provide footnotes for each site mentioned in Section 3.3 (ex: zevents, linkedin, eventbrite and ticketmaster)
3. Paper would be even stronger if it could reveal stats about the applications/sites using EventMedia dataset.

Solicited review by Christophe Gueret:

This paper presents EventMedia, a dataset about events created by collecting and integrating data from different sources.

I warmly recommend to publish it as is but would still have a few remarks to make:
* Page 2, it is said that the data is accessible via a REST API and a SPARQL end point. What about the de-referencing of the resources? Is it considered as part of the REST API?
* It is somehow unclear what is exactly stored in eventmedia and what stays on the data sources that are aggregated. For instance, Figure 3 shows part of the description for the entity from Flickr whereas Figure 2 doesn't... The knowledge base created contains a lot of data that is harvested from different providers, do you import everything that they provide or just point to their resources for those publishing Linked Data?
* IMHO, Community/Professional/Family and all subtypes of SocialGathering (c.f. page4, right column).
* What is exactly the relation between EventMedia and LinkedEvents? It sounds like LinkedEvents is the actual data set and that EventMedia is an application using it. If so, the title of the paper should be revised.
* In Section 5.1, what is the gold standard that was used to compute the precision and recall score?

Quality of the dataset: good
Usefulness (or potential usefulness) of the dataset: good (altough there is no indication of external application making use of the data being described)
Clarity and completeness of the descriptions: good