Review Comment:
The authors propose a set of index structures and algorithms for the task of diversified Top-k querying in temporal knowledge graphs. There are several issues with writing/paper style, redundancies as well as formalism, intuition and evaluation, plus a missing attribution to a re-used figure.
=== Intuition ===
- Abstract: The abstract mentions several concepts (snapshot and log mode, unit matching) and algorithm names (TgTop, TgTop-Pruning) which are not explained in a way that the reader gets an intuition about the approach while reading the abstract. Similar for the description of the second contribution in Section 2.
- Example: Fig. 1 as an example should be larger, and node and edge labels should be added directly to the figure.
- Snapshot and log mode: In abstract and introduction, the terms “snapshot and log mode” are introduced but not properly explained. In Section 4, authors mention “RDF graphs include snapshots or incremental logs” which still does not well explain these concepts. Formally, the term “snapshot” should be mentioned in Definition 2. Logs are denoted as “Lt_i^j” in Section 4 but not introduced before.
- Node weights: One of the main objectives of the approach is to return results with high weights. However, the idea behind weights is only insufficiently introduced. Authors mention “high-weight results”, “importance of the results” and node weight is later computed based on node degree, but given how important this concept is, I would like to see more discussion about node weights.
=== Formalism ===
- Members of a set should be in lower case letters (e.g., in Definition 5: “nodes {V_1, V_2, …}”).
- Weight matrix: As the weights of nodes change over time, W shoud be defined to map from VxT to R.
- Temporal Matching (Definition 4 and Section 6.4): I do not understand the need for introducing temporal matching in such details. According to the definition, query time and graph time match if they overlap or have the inclusion relationship. However, if one time interval includes another, they also overlap. The resulting time interval “[max(t_s,t_a),min(t_e ,t_b)]” is valid for all cases. Consequently, I do not see the need for Definition 4 (specifically, for Table 1) and for some computations in Section 6.2.
- The adjacency table in Section 4 mentions IDs and labels of nodes that should be introduced earlier. The sentence “We first classify each Label according to its type” is unclear.
- Problem Statement: Equation (8) itself is not the problem; the problem is to find the set M that maximises F(M). The equation is taken from [31] which needs to be mentioned.
=== Approach ===
- Overview: Since the approach introduces 10 algorithms and three types of matching (unit, overall and log), I would appreciate to get an overview of these components first.
- Node types: Section 5 mentions “there are not many node types”. What does node type refer to here? Is this a characteristic of the dataset?
- Labels: The different indexes use the notion of “labels” but it is not clear how these labels are used.
- Redundancy: Section 6.2.1 is highly redundant: The texts before and after Fig. 9 are repetitive, and Algorithms 1-6 are highly similar. I would suggest only to have one algorithms where only specific lines are changed w.r.t. the pattern.
- Is the (S,P,O) a SPARQL “ASK” query?
- Contributions: The paper solves the same problem as [31], but on a temporal knowledge graph. Since results are similar (except for the proposed pruning approach which is clearly faster), a more detailed comparison to [31] should be provided. This way, the authors should make clear the novelty and contributions of this paper, which seem low given the text repetitions and the similarity to [31].
=== Evaluation ===
- Datasets: It is not clear how exactly the evaluation datasets are created. For example, Wikidata only has 1.1M nodes, less than YAGO, although Wikidata has many more nodes. It is also unclear how annual update interval is created and which years are considered. Are the annual Wikidata snapshots historic versions of Wikidata or are they created w.r.t. the temporal properties in a recent Wikidata snapshot?
- Queries: No information is given how the queries in Section 7.3 were created and no examples are given. For the overall matching algorithm in Section 7.4, more details are given (“randomly”) but not sufficient.
- Linearity: The text states that the proposed algorithm “does not increase the query time linearly”. But Fig. 13 shows a linear trend.
=== Presentation ===
- Redundancy in text: Several parts of the article are rather repetitive. An example are the repetitions of “In this paper, … datasets is selected for the study of temporal knowledge graphs” in Section 7.2 and the sentence “In this paper, the diversified Top-k query problem of the temporal knowledge graph is referred to as the diversified Top-k query (DTQ) problem.” in Definition 8.”
- Chinese characters in diagrams: The diagram labels in Fig. 12 and the caption in Fig. 12 f) are Chinese.
- Text style: The font and formatting changes regularly throughout the paper (e.g., in the last paragraph of page 1 and in Section 7.5).
- Citations: References to literature should be formatted properly. Not “Zheng W[5] et al”, but “Zheng et al. [5]”. Also not “Literature [22]”.
- There are several sentences which are hard to read and thus it is hard to understand some core messages of the article. Examples include “A temporal knowledge graph matching Q is called if and only if the following conditions hold. denoted as M_i [t_s,t_e ](t_s
|