ChronoGrapher: Event-centric Knowledge Graph Construction via Informed Graph Traversal

Tracking #: 3848-5062

Authors: 
Ines Blin
Ilaria Tiddi
Remi van Trijp
Annette ten Teije1

Responsible editor: 
Guest Editors KG Construction 2024

Submission type: 
Full Paper
Abstract: 
Event-centric knowledge graphs help enhance coherence to otherwise fragmented and overwhelming data by establishing causal and temporal connections using relevant data. We address the challenge of automatically constructing event-centric knowledge graphs from generic ones. We present ChronoGrapher, a two-step system to build an event-centric knowledge graph from grand events such as the French Revolution. First, a pruned, semantically informed best-first search traversal retrieves a subgraph from large, open-domain knowledge graphs. We define event-centric filters to prune the search space and a heuristic ranking to prioritise nodes like events. Second, we combine a structured triple enrichment method with a text-based triple enrichment method to build event-centric knowledge graphs. ChronoGrapher demonstrates adaptability across datasets like DBpedia and Wikidata, outperforming approaches from the literature. Furthermore, it is designed to be flexible and to operate over any knowledge graph accessible through HDT dumps or SPARQL endpoints. To evaluate the utility of these constructed graphs, we conduct a preliminary user study comparing different prompting techniques for event-centric question-answering. Our results demonstrate that prompts enriched with event-centric knowledge graph triples yield more factual answers—measured by how well answers are grounded in source information—than those enriched with generic triples or base prompts, while preserving succinctness and relevance.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Dylan Van Assche submitted on 26/May/2025
Suggestion:
Accept
Review Comment:

The reviewers incorporated all my suggestions and addressed my questions in this new revision. Therefore, I advise the editor to proceed with accepting this work. The PDF of the paper needs to be re-uploaded, but I could review this work through the diff and revision letter provided in the supplementary materials.

Review #2
By Simon Gottschalk submitted on 02/Jun/2025
Suggestion:
Minor Revision
Review Comment:

I thank the authors for their extensive revision and comments. The article and the repository material have been significantly improved and while I still have few comments/questions, these are rather minor:

- Section 3.1 / my Comment 3: While 3.1 has been revised, I still somewhat miss a very clear objective of Event-centric Subgraph Extraction. I suggest to start Section 3.1 with a sentence such as "Given a target event of interest and a knowledge graph $KG$, our goal is to extract an event-centric knowledge graph from $KG$ which covers relevant information regarding this event. To do so, we propose a link traversal-based method..."

- Definition 7: Two questions: (i) I assume that s' is always an event if $subevent_of(s', n_{start})$ holds, i.e. "$type(s') = event" is redundant? (analogously for o') (ii) By this definition, $n_{start}$ itself is missing from the graph (if not $subevent_of(n_{start}), n_{start})$) which is not intuitive). This is also confirmed by https://github.com/SonyCSLParis/graph_search_framework/blob/main/kg-exam... where dbr:The_French_Revolution is not defined.

- p17, l17: As you confirmed in your comment, you are reusing the method in [8]. This is totally okay but could be made even more explicit, e.g., "...using the transformer-based model by Chanin et al. [8]".

- Fig. 5 shows the result before applying entity linking, correct (that's why "the French Revolution" is a literal)? Can this be clarified in the caption?

- Table 5: I struggle to immediately understand this table. An updated caption should help. I assume the cell numbers refer to the number of events in the KGs, so make that clear (not just "Statistics on all events). Also clarify what "Final" means here.

- 4.2.3: I still think that the study design is not detailed enough: how were the questions created (based on the six question types which presumably served as templates)? How many questions were investigated in your study? I also don't clearly understand how you map question to the event. I assume that your input consists of the question and the KG created about the event in the question? Or was there any kind of automated matching to the event involved?

- Table 15: For several question types, there is frame-based information provided. However, now we know that there on average 28k triples extracted from the text. What part of these triples goes into the prompts?

- Code: The new folder (https://github.com/SonyCSLParis/graph_search_framework/tree/main/kg-example) is good, but it should definitely get a readme file explaining what each of these files are.

Minor:
- My old comment 3: I was asking if, non-sub-event relations, for example, preceding events (e.g., WWI->WWII) are part of the event-centric KG or not. And if yes, if they are considered sub-events.
- Algorithm 1: "see Section 3" is a bit rough.
- p10, l46: Consider briefly extending "candidate nodes are scored and ranked [using/by/through ...] to guide the next iteration"
- p13, l27: "p, p \in R"?
- p 22, l51: 'DBO:ABSTRACT" (and why even upper-cased)?

Typos/very minor:
- p3, l45: The indents of the RQs are somewhat off.
- p7, l37: Dot missing at the end.
- p10, l49: "highlight"
- p15, l10+12: text overflow to the right
- p16, l44: The correct wd prefix is http://www.wikidata.org/entity/ (it is then forwarded to "wiki/")
- Footnote 19 is split over two pages

Review #3
By Luis-Daniel Ibáñez submitted on 25/Jun/2025
Suggestion:
Accept
Review Comment:

This is a resubmission. After examining the authors' answers I am satisfied that the new version is up to the acceptance standards.

I have just a couple of minor comments to the added text:

"the nature of our tasks, which are not memory- or CPU-bound" --> I believe you mean intensive, not bound. If your algorithms don't use neither memory nor cpu, theya re free.... It would be good to comment on what is more intensive in your approach. Assuming the memory cost of hosting the KG is always there, it seems your approach makes multiple (low-cost) calls to the query interface, making it CPU intensive.

On table 3 you use the term "unexplored", while in previous part of the paper you use "unvisited". Merge them or explain the difference.