EventKG - the Hub of Event Knowledge on the Web - and Biographical Timeline Generation

Tracking #: 1951-3164

Authors: 
Simon Gottschalk
Elena Demidova

Responsible editor: 
Guest Editors Knowledge Graphs 2018

Submission type: 
Full Paper
Abstract: 
One of the key requirements to facilitate the semantic analytics of information regarding contemporary and historical events on the Web, in the news and in social media is the availability of reference knowledge repositories containing comprehensive representations of events, entities and temporal relations. Existing knowledge graphs, with popular examples including DBpedia, YAGO and Wikidata, focus mostly on entity-centric information and are insufficient in terms of their coverage and completeness with respect to events and temporal relations. In this article we address this limitation, formalize the concept of a temporal knowledge graph and present its instantiation - EventKG. EventKG is a multilingual event-centric temporal knowledge graph that incorporates over 690 thousand contemporary and historical events and over 2.3 million temporal relations extracted from several large-scale knowledge graphs and semi-structured sources and makes them available through a canonical RDF representation. Whereas popular entities often possess hundreds of relations within a temporal knowledge graph such as EventKG, generating a concise overview of the most important temporal relations for a given entity is a challenging task. In this article we demonstrate an application of EventKG to biographical timeline generation, where we adopt a distant supervision method to identify relations most relevant for an entity biography. Our evaluation results provide insights in the characteristics of EventKG and demonstrate the effectiveness of the proposed biographical timeline generation method.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Sep/2018
Suggestion:
Minor Revision
Review Comment:

This paper describes the details of constructing an event-centric temporal knowledge graph, and shows how the EventKG can be applied for generating biographical timelines. The description of most of the details is clear. It also shows the evaluation of each single step/component and demonstrates the quality of the EventKG.

Weakness:
In many sections, the paper claims to extract events and temporal relations. However, I found most of the events and relations are just discovered from structured and semi-structured KB, which is different from the extraction from unstructured texts. This difference should be described clear in the introduction. In Information Extraction area, several studies have been conducted on automatic construction of entity-centric and event-centric knowledge graph from unstructured news. e.g., [TinkerBell: Cross-lingual Cold-Start Knowledge Base Construction] for KBP. These studies should be included in related work. In addition, in introduction, it’s better to provide a clear definition for the event/temporal relation that this paper is focusing on.

During the construction of the EventKG, especially during the integration step, how the events/temporal relations from multiple structured sources are linked together? for example, many entity_names/event_names/relation_names that refer to the same concept are different in WikiData and YAGO.

The related work, the authors missed a lot of related studies for the extraction of events from news. [6] is not the state-of-the-art extraction approach in news.

Review #2
By Pedro Szekely submitted on 03/Oct/2018
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Overall impression

This paper is very interesting. The authors describe an approach and system for creating a knowledge graph of events by integrating and fusing information from large structured souces (wikidata, dbpedia) and semi-structured sources (wikipedia event list and wikipedia current events). The integrated knowledge graph has better coverage and more roles defined for each event than the consituent sources (good evaluations). In addition, the authors show how to use the KG to produce biographical timelines for entities. The evaluation of this part of the work shows that the timelines produced are better than an existing baseline.

The main strength of the paper is that the resulting KG is unique, interesting and high quality. The first part of the paper focuses on the construction of the event KG. This part of the work uses existing techniques to do the extraction, integration and fusing of data. The originality is not in the techniques, but rather in how the techniques are combined and configured to obtain the end result. Many readers will find this part of the paper interesting, but lacking in specific details. The description is high level and uses an example to illustrate individual cases. However, it is too high level to enable even an experienced researcher to know how to replicate the results. While not a show stopper for publishing the paper, one additional page of details would be very helful here.

The second part of the paper focuses on the creation of biographical timelines. The challenge here is that for popular entities, the number of relations that could be included in the biographical timeline is very large. The goal is to identify the most salient ones. This part of the paper is more novel than the first part. The authors use a machine learning approach to train a classifier to distinguish interesting vs not interesting relations. There is not annotated corups, so the authors use a semi-supervised approach to generate training data by cleverly extracting event features from existing biographical copora. The approach is interesting and the paper describes it well.

The evaluation is excellent. The authors evaluate the event KG showing how it improves over the event information of its constituent sources. The paper has many interseting statistics and tables that evaluate the constribution of the different ideas in the approach. The evaluation of the timelines is also very good, using a combination of statiscts and a user study.

A significant aspect of the work is that it is the first high quality event KG built by combining multiple sources. The work on producing biographical timelines illustrates the potential that the KG has for building new applications.

The quality of the writing is very good. My only reservation is the lack of details in section 4.3, which would benefit from an additional page.

The related work is OK. It cites most work in the area, but misses citations to important work on event ontologies and extraction from text. The CAMEO ontology and ICEWS and Gdelt event extraction systems should be cited.

Additional comments are available in a marked up PDF file.

I recommend accepting the paper with minor revisions.

Review #3
By Enrico Daga submitted on 02/Nov/2018
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Review #4
By Francesco Osborne submitted on 05/Nov/2018
Suggestion:
Minor Revision
Review Comment:

The paper presents a definition of temporal knowledge graph and introduces EventKG, a knowledge base which includes 690K events and over 2.3 million temporal relations. It applies EventKG to the task of generating biographical timelines, showing that it performs better than an alternative approach.

The paper is generally clear and well written. However, it is sometimes too verbose and redundant. In particular, certain concepts are repeated again and again in different sections. I suggest the authors to prune some of this instances to make the paper easier to read.

The generation of knowledge graphs of events is a significant topic and it is well addressed in this paper. The research contributions are clear and significant. In particular, EventKG appears to be a useful and potentially influential knowledge base. The approach used for the extraction of EventKG does not appear particularly innovative, but it seems to yield good results. The evaluation could use some improvements (see below), but overall it is well done. In particular, I commend the authors for making publicly available all the relevant data.

I think this is a good paper that should be accepted by the journal. However, it contains a number of unclear parts that need to be addressed in the final version. For this reason, I suggest a minor review.

In the following, I will focus on a number of issues that need to be fixed in the next version.

“Step Ib: Using additional event identification heuristics to increase recall”
This would typically yield also a decrease in recall. Please elaborate on this trade-off.

Section 5.
Some important components of the approach for generating timelines (biographical sources and STM settings) are briefly mentioned here and then fully explained in Section 7.1 and 7.2. I find this solution quite confusing. I would suggest to move Section 7.1 and 7.2 to Section 5.

Section 5.3.
I am puzzled at the choice of using a binary classifier for deciding the relevance of timeline entries, rather than an approach that would produce a ranked list. What if the classifier returns hundreds or thousands of relevant timeline entries? How would an application decide what to show in a user interface? The paper need to elaborate more about the rationale behind this choice and relevant advantages and drawbacks.

Section 6.
Please include relevant statistical tests to prove that the results are statistically significant.

Section 7.
It is not clear to me how the authors selected the timeline entries which composed the timeline for TM and their approach. Did they show to the users all the timeline entries that were classified as pertinent by the approach described in section 5.3? What setting did they use for the TM approach? Please add more details about this comparison and include the relevant statistical tests.

Minor typos
“reference sources:” > “reference sources.”