Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Diversified Top-k query research on temporal knowledge graph

Submitted by Wang Zhe on 07/21/2023 - 03:47

Tracking #: 3504-4718

Authors:

Li Song

Wang Zhe

Zhang Liping

Responsible editor:

Elena Demidova

Submission type:

Full Paper

Abstract:

In order to quickly query the results with high weight and low redundancy on the temporal knowledge graph, this paper proposes a diversified Top-k query algorithm for the temporal knowledge graph. Firstly, the storage scheme and index structure of the temporal knowledge graph combining snapshot and log mode are given. A unit matching algorithm and an overall matching algorithm based on snapshot mode are proposed to obtain a set of snapshot matching results. A matching algorithm based on log modes is proposed to obtain a set of log matching results. Then, the TgTop algorithm, a diversified Top-k query algorithm, is proposed, and the results with high weight and low redundancy are obtained. In addition, an efficient pruning algorithm, TgTop-Pruning algorithm, is proposed to reduce the time overhead of diversified Top-k processing. Finally, experimental results on four real datasets show the effectiveness and efficiency of the proposed query algorithm.

Full PDF Version:

swj3504.pdf

Tags:

Reviewed

Decision/Status:

Reject

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Simon Gottschalk submitted on 24/Aug/2023

Suggestion:
Reject

Review Comment:

The authors propose a set of index structures and algorithms for the task of diversified Top-k querying in temporal knowledge graphs. There are several issues with writing/paper style, redundancies as well as formalism, intuition and evaluation, plus a missing attribution to a re-used figure.

=== Intuition ===
- Abstract: The abstract mentions several concepts (snapshot and log mode, unit matching) and algorithm names (TgTop, TgTop-Pruning) which are not explained in a way that the reader gets an intuition about the approach while reading the abstract. Similar for the description of the second contribution in Section 2.
- Example: Fig. 1 as an example should be larger, and node and edge labels should be added directly to the figure.
- Snapshot and log mode: In abstract and introduction, the terms “snapshot and log mode” are introduced but not properly explained. In Section 4, authors mention “RDF graphs include snapshots or incremental logs” which still does not well explain these concepts. Formally, the term “snapshot” should be mentioned in Definition 2. Logs are denoted as “Lt_i^j” in Section 4 but not introduced before.
- Node weights: One of the main objectives of the approach is to return results with high weights. However, the idea behind weights is only insufficiently introduced. Authors mention “high-weight results”, “importance of the results” and node weight is later computed based on node degree, but given how important this concept is, I would like to see more discussion about node weights.

=== Formalism ===
- Members of a set should be in lower case letters (e.g., in Definition 5: “nodes {V_1, V_2, …}”).
- Weight matrix: As the weights of nodes change over time, W shoud be defined to map from VxT to R.
- Temporal Matching (Definition 4 and Section 6.4): I do not understand the need for introducing temporal matching in such details. According to the definition, query time and graph time match if they overlap or have the inclusion relationship. However, if one time interval includes another, they also overlap. The resulting time interval “[max(t_s,t_a),min(t_e ,t_b)]” is valid for all cases. Consequently, I do not see the need for Definition 4 (specifically, for Table 1) and for some computations in Section 6.2.
- The adjacency table in Section 4 mentions IDs and labels of nodes that should be introduced earlier. The sentence “We first classify each Label according to its type” is unclear.
- Problem Statement: Equation (8) itself is not the problem; the problem is to find the set M that maximises F(M). The equation is taken from [31] which needs to be mentioned.

=== Approach ===
- Overview: Since the approach introduces 10 algorithms and three types of matching (unit, overall and log), I would appreciate to get an overview of these components first.
- Node types: Section 5 mentions “there are not many node types”. What does node type refer to here? Is this a characteristic of the dataset?
- Labels: The different indexes use the notion of “labels” but it is not clear how these labels are used.
- Redundancy: Section 6.2.1 is highly redundant: The texts before and after Fig. 9 are repetitive, and Algorithms 1-6 are highly similar. I would suggest only to have one algorithms where only specific lines are changed w.r.t. the pattern.
- Is the (S,P,O) a SPARQL “ASK” query?
- Contributions: The paper solves the same problem as [31], but on a temporal knowledge graph. Since results are similar (except for the proposed pruning approach which is clearly faster), a more detailed comparison to [31] should be provided. This way, the authors should make clear the novelty and contributions of this paper, which seem low given the text repetitions and the similarity to [31].

=== Evaluation ===
- Datasets: It is not clear how exactly the evaluation datasets are created. For example, Wikidata only has 1.1M nodes, less than YAGO, although Wikidata has many more nodes. It is also unclear how annual update interval is created and which years are considered. Are the annual Wikidata snapshots historic versions of Wikidata or are they created w.r.t. the temporal properties in a recent Wikidata snapshot?
- Queries: No information is given how the queries in Section 7.3 were created and no examples are given. For the overall matching algorithm in Section 7.4, more details are given (“randomly”) but not sufficient.
- Linearity: The text states that the proposed algorithm “does not increase the query time linearly”. But Fig. 13 shows a linear trend.

=== Presentation ===
- Redundancy in text: Several parts of the article are rather repetitive. An example are the repetitions of “In this paper, … datasets is selected for the study of temporal knowledge graphs” in Section 7.2 and the sentence “In this paper, the diversified Top-k query problem of the temporal knowledge graph is referred to as the diversified Top-k query (DTQ) problem.” in Definition 8.”
- Chinese characters in diagrams: The diagram labels in Fig. 12 and the caption in Fig. 12 f) are Chinese.
- Text style: The font and formatting changes regularly throughout the paper (e.g., in the last paragraph of page 1 and in Section 7.5).
- Citations: References to literature should be formatted properly. Not “Zheng W[5] et al”, but “Zheng et al. [5]”. Also not “Literature [22]”.
- There are several sentences which are hard to read and thus it is hard to understand some core messages of the article. Examples include “A temporal knowledge graph matching Q is called if and only if the following conditions hold. denoted as M_i [t_s,t_e ](t_s

Review #2

Anonymous submitted on 08/Oct/2023

Suggestion:
Reject

Review Comment:

In this paper, the authors have proposed various algorithms for indexing and matching for diversified top-k querying in the temporal knowledge graphs.
The related work mentions current methods, but none of them are used as baselines to compare the work.
One of the major issues is the citations. The work is based on the baseline paper ('Diversified Top-k Querying in Knowledge Graphs') but it is not referred to in the problem statement section. Figure 1 is a replica of the figure from the baseline paper with some minor changes. There are many claims that have not been cited.
The writing is not up to par with the standards of the journal (uneven font, unclear figures, uneven spacing).
Some of the information is in Chinese and cannot be comprehended.
On the technical level, the contributions, such as storage, indexing, and matching algorithms, are not compared to any state-of-the-art methods.
No supplementary material is provided.
In its current state, I would suggest rejecting the paper.

Log in or register to post comments
1670 reads

Main menu

Editorial Board

Syndicate

Diversified Top-k query research on temporal knowledge graph

Tracking #: 3504-4718

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Diversified Top-k query research on temporal knowledge graph

Tracking #: 3504-4718

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles