Review Comment:
The paper presents a definition of temporal knowledge graph and introduces EventKG, a knowledge base which includes 690K events and over 2.3 million temporal relations. It applies EventKG to the task of generating biographical timelines, showing that it performs better than an alternative approach.
The paper is generally clear and well written. However, it is sometimes too verbose and redundant. In particular, certain concepts are repeated again and again in different sections. I suggest the authors to prune some of this instances to make the paper easier to read.
The generation of knowledge graphs of events is a significant topic and it is well addressed in this paper. The research contributions are clear and significant. In particular, EventKG appears to be a useful and potentially influential knowledge base. The approach used for the extraction of EventKG does not appear particularly innovative, but it seems to yield good results. The evaluation could use some improvements (see below), but overall it is well done. In particular, I commend the authors for making publicly available all the relevant data.
I think this is a good paper that should be accepted by the journal. However, it contains a number of unclear parts that need to be addressed in the final version. For this reason, I suggest a minor review.
In the following, I will focus on a number of issues that need to be fixed in the next version.
“Step Ib: Using additional event identification heuristics to increase recall”
This would typically yield also a decrease in recall. Please elaborate on this trade-off.
Section 5.
Some important components of the approach for generating timelines (biographical sources and STM settings) are briefly mentioned here and then fully explained in Section 7.1 and 7.2. I find this solution quite confusing. I would suggest to move Section 7.1 and 7.2 to Section 5.
Section 5.3.
I am puzzled at the choice of using a binary classifier for deciding the relevance of timeline entries, rather than an approach that would produce a ranked list. What if the classifier returns hundreds or thousands of relevant timeline entries? How would an application decide what to show in a user interface? The paper need to elaborate more about the rationale behind this choice and relevant advantages and drawbacks.
Section 6.
Please include relevant statistical tests to prove that the results are statistically significant.
Section 7.
It is not clear to me how the authors selected the timeline entries which composed the timeline for TM and their approach. Did they show to the users all the timeline entries that were classified as pertinent by the approach described in section 5.3? What setting did they use for the TM approach? Please add more details about this comparison and include the relevant statistical tests.
Minor typos
“reference sources:” > “reference sources.”
|