iSummary: Workload-based Selective Summaries for Knowledge Graph Exploration

Tracking #: 3815-5029

Authors: 
Giannis Vassiliou
Nikolaos Papadakis
Haridimos Kondylakis

Responsible editor: 
Katja Hose

Submission type: 
Full Paper
Abstract: 
The rapid growth in size and complexity of Knowledge Graphs available on the web has created a pressing need for efficient and effective methods to facilitate their understanding and exploration. Recently, semantic summaries have emerged as a means to quickly comprehend and explore them. However, most existing approaches are static, failing to adapt to user needs and preferences, and often struggle to scale. In this paper, we introduce iSummary, a novel and scalable approach for constructing summaries given specific user requests in terms of nodes for the summary to be based on. Given that the size and complexity of Knowledge Graphs pose challenges to efficient summary construction, our approach leverages query logs. The core idea is to harness the knowledge embedded in existing user queries to identify the most relevant resources and establish meaningful connections, thereby generating high-quality summaries. We propose an algorithm with theoretical guarantees on summary quality, operating linearly with respect to the number of queries in the log. To assess our method, we conduct experiments using two real-world datasets and multiple baselines, demonstrating that iSummary consistently outperforms existing techniques in both quality and efficiency.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/Jun/2025
Suggestion:
Reject
Review Comment:

The paper presents an algorithm iSummary to compute a form of non-quotient entity-centric summary based on a user "query" S of Lambda-many nodes of interest over a query log. The idea of personalized summaries is interesting. The submission is based on two papers at a Semantic Web conference, which are cited.

Overall, I cannot recommend to accept the paper in its current form. The reasons are in a lack of clarity on assumptions, details on the algorithm, experimentation (comparability), and others. As feedback is detailed, it is provided page by page.

Abstract: It makes a lot of claims, but summarization for scale and incremental (non-dynamic) exist a lot. The presented paper also does not address this as the iSummary has to be computed for each user query over the entire query log. That does not scale with large logs and/or high workloads (many queries).

The statement on "lineary complexity with respect to the number of queries in the log" is wrong, as each processing of a query is essentially a NP-hard problem as the authors themselves state.

Page 2

Line 4+5: unclear; sure, I need the KG to compute the summary. That is the point; here in the paper the KG is the query log.

list of contributions needs to be made clearer

parameters Lambda/Kappa need to be explained

Bullet 4 is confusing at this point. It should be explained that the KG here actually plays a role, even, as one learns later, only a top 1 is chosen (where the top 1 is actually not a ranked answer but depends on the implementation of the SPARQL endpoint) -> which can be a matter for determinism

Line 30: propose new algorithms (plural) -> the main algorithm is the same; at least it is only one algorithm. The difference to the Semantic Web conference paper unclear.

Page 3

Can a user really specify the Lambda? This assumption is weak and should be better motivated and discussed. What kind of queries are supported and what not? This should also be reflected in the Limitations.

Example 2.1: what happens if there are no x hops between the selected resources in Lambda; but the user specifies Kappa = x + 1

Why is Lambda just the number of resources, not the set of resources (which later is introduced with S). Actually, the example suggests that Lambda is the set.

Page 4

Definition 1 is problematic. It is not based on triples as initially done. How is E defined? There is a common issue in defining Semantic Web graph, either it goes through a set of triples or over labeling functions. Combining them generally does not work. Here, the KG is a plain unlabeled graph G = (V, E).

This make the paper hard to read and follow as it contains some further sloppiness in the definitions and algorithm.

Page 5: storing the weights per user or mined query has to be done -> this should be mentioned

Lines 38ff: is this a reasonable assumption that user can define their information need in terms of Lambda / Kappa ?

This makes strong assumptions. The queries need to be both syntactically and semantically correct. Also, it is assumed that a query returns a non-empty set, which might be true or not for a given query.

But if for any reason the result set of a query is zero or syntactically or semantically incorrect, it cannot and should not be used in these considerations. Those cases and how to respond to them needs to be discussed.

Also, we do not know really if certain connections expressed in the query patterns are "useful nodes" to the user, as these might by queries that are abandoned. Later modified to yield the desired result etc.

Page 6

Line 4: Again, this assumes that the queries written actually match the input graph. An assumption one cannot generally make.

Line 8: What are the advantages and drawbacks of purely using frequency as selection criteria which nodes to use to expand / add? (see Example 3.2)

Two important things have to be clarified: When talking about Kappa, i.e., the nodes to be included, the paper does not distinguish between types (as in RDF) and resources (entities). So a type edge is treated like every other property. This needs to be said explicitly and doing so needs to be discussed (what are the advantages vs disadvantages from a system and user views)

The example suggests that line 15 is a shortest path of length 2. But it is not. This should be clarified.

Def. 1. defines a summary as smallest maximum weight tree G'. But the example shows neither. I like to see the weight tree explicitly, for a specific user u and perhaps a different user v. Also, where are the weights? How are they defined for a set of entities S?

Sec 3.3: Going through the queries only is a very limited approach. It will mean that it never uses "new" nodes that are rare or not yet discovered by users. Why not a mixtures of both sources?

Especially when the matching is limited to named entities in the log, i.e., "In our case here (?x, a, Person) can be mapped to (Vassiliou, a, Person) from Q6 and". This will be very incomplete.

Line 30 : sentences like 30 need to be explained and motivated. Take a position, either having variables is good or it is not. What can you do with them / without them? Why might we like to have variables replaced?

Page 7

Line 8: Ok, querying the SPARQL endpoint, in case it exists. But this again would be very expensive. But with "LIMIT 1", again a very incomplete pictures is drawn. Also see comment on introduction.

Line 37: add explanation for input nodes S = user query

Line 41: Wrong notation in (κ − λ)/λ) -> what does that actually mean?

Page 8

The iSummary has to be computed over the set of queries Q for each set S of user nodes. So it is basically no index. For every new query, it needs to be recomputed.

Line 13: Why is visited initialized with nodes[0]. This means that in the subsequent line, the for-all loop, the variable x will never be assigned with the first node.

Line 14: notation is strange. First, the ( ... ) should not be there. Second, the authors mean "nodes \ visited"; to make clearer that this is a set operation, one should use capitalized variables like Top, Nodes \ Visited etc.

More questions come up with the pseudo-code. One is:

findMostFrequent( ... ) <-- what are the parameters; which ones are included, which ones not?

q in Q need to be defined as graphs. See also comment below.

The pseudo code requires a significant rework.

Sec. 3.5: Let it be deterministic by design (in pseudo-code), not based on implementation. Better: Argue why it is deterministic.

Is the algorithm also permutation invariant? One cannot see this as the ties are broken based on "first choice". What does that mean?

Page 9

One does not see where the algorithm computes an "approximate solution".

The notion of weight in this paper is not clear. Weight appears to be the frequency of certain edges in the query set that is relevant to the user's Lambda-sized node set S. But this is not explained or formally defined.

Having this said, the Theorem 3 is very confusing. What does the bound mean here and what is the constant l ? It is not discussed or picked up in the proof.

Furthermore, in it is written: Theorem 1. The λ/κ-Selective Summary problem is NP-complete with reference to Steiner graph. Those are computed over the input graph.

Here, the input is a set of Query logs, where each log entry itself is a graph.

This relationship between Theorem 3 and 1 must be clarified.

Also clarify what is the relationship between iSummary and CHINS as the authors themselves suggest that they are essentially the same with "iSummary replicates the CHINS approximation algorithm which has been proved to
have the aforementioned worst-case bound [20]"?

Page 10

The Limitations are not properly discussed. See above on the assumptions.

Page 11

Def 2, in this formula it becomes clear that in the paper Q (the queries) are not understood as graphs. A formal definition of a query in Q will help, also in the context of the pseudo code.

Sec 4.4.1 The procedure description needs to be improved. The train test split means: what is computed during training? iSummaries for the 80% of the queries in the log? Based on what user set S? It is confusing to see that there is no connection to the pseudo code.

Sec 4.4.3 Why is CHAMPS not used as baseline? And is it actually a reasonable comparison of the "Random which uses query logs, PPR and GLIMPSE which run directly on the KG". It is not clear how the baselines operate as it is not described. Also this statement should be brought in the description of the models, not in the results.

Sect 4.4.4 This is a bogus comparison to argue that "approaches relying entirely on the KG for calculating the summary (PPR and GLIMPSE) require one
order of magnitude more time than the ones relying on query logs." Basically, the logs will be smaller then the KG. If inversely, the KG may be very small compared to the log, the inverse relation may be observed.

More problematic: It really does not compare anything. The KG is the data of everything relevant to model in, e.g., Wikidata. While the log reflect on what users frequently ask for, whether it is in the graph or not, or whether it makes semantically sense or not.

Review #2
By Blerina Spahiu submitted on 14/Jun/2025
Suggestion:
Major Revision
Review Comment:

The paper proposes iSummary, which leverages existing SPARQL query logs to infer user preferences and identify relevant resources, thereby addressing the challenges of adapting to diverse user needs and the computational expense of large KGs. The proposed algorithm offers theoretical guarantees on summary “quality” and operates efficiently, demonstrating superior performance against existing techniques on real-world datasets like Wikidata and DBpedia.Experiments on DBpedia and Wikidata demonstrate that iSummary delivers 18–135% more query-fragment coverage while running 14–40× faster than baselines such as Personalized PageRank and GLIMPSE.

Positive Points:

1. The paper introduces a novel approach to selective semantic summarization by leveraging generic query logs, which is a practical way to infer user preferences without requiring explicit user-defined weights.
2. It addresses the computational challenges associated with large KGs by operating linearly with respect to the number of queries in the log, and demonstrates superior efficiency compared to existing methods.
3. The approach is evaluated using two real-world datasets (Wikidata and DBpedia) and 3 baselines.

Negative Points:
1. The approach heavily relies on the availability and quality of generic query logs. If query logs are sparse, biased, or do not represent user interests, the generated summaries might be less accurate or comprehensive.
2. While the paper mentions an alternative algorithm for cases where the query log is not adequate, the specifics of its performance and limitations, especially for entirely new domains or very niche user interests not well-represented in logs, could be further elaborated.
3. The algorithm selects the most frequent shortest path. However, the paper does not discuss how ties in frequency are handled beyond a deterministic first choice. Are there scenarios where a slightly longer but semantically richer path might be more desirable?
4. Moreover, the two methods for resolving variables (mining query logs or querying the SPARQL endpoint) might introduce bias depending on the log content or the endpoint's default result ordering. The paper acknowledges that the summaries produced by the two approaches might not always be the same, but the authors do not quantify the potential divergence or its impact on summary quality.
5. While the paper states that isummary provides "theoretical guarantees on summary quality" and "consistently outperforms existing techniques in both quality and efficiency", the specific metrics "summary quality" beyond coverage and efficiency could be more explicitly detailed. The quality of the summary is not even evaluated. Maybe the authors should change the name to a more specific one, maybe, just call it coverage.
6. The examples used (e.g., Fig. 1 and accompanying SPARQL queries) are relatively simple. While illustrative, it would strengthen the paper to discuss how these examples scale or become more complex with larger, real world query patterns.
7. The use of generic query logs is highlighted as an advantage, there's no discussion on the potential privacy implications of using such logs, even if "generic". A brief acknowledgment of this aspect or how potential privacy concerns are mitigated would be beneficial.
8. The authors should also acknowledge that this approach could work only when "enough" (how much?) logs are available.
9. There are far more citations in the references section than cited in the paper. All papers in the reference list are important, so, it they should be cited in the text.
10. The Random approach is only mentioned in the experiment section (and its citation is also missing), however, it is not even described in the related work. The related work section is very superficial and does not provide a thorough review of the state of the art, nor does it clearly explain how iSummary differs and what its advantages are.

Questions:
1. As the approach presented relies on query logs, what mechanisms or strategies do you employ to handle "cold start" scenarios? For example, where relevant query logs might be minimal or non-existent for a particular set of seed nodes or a newly emerging domain within the KGs?
2. Could you elaborate on the criteria and potential impact of the "first choice" tie-breaking mechanism when selecting the "most frequent shortest path" or resolving variables, especially in cases where multiple paths or instantiations have identical frequencies or relevance? Have you explored alternative tie-breaking strategies or a probabilistic approach?
3. The paper mentions two variable resolution algorithms: one relying on query logs and another on directly querying the SPARQL endpoint. Could you provide a more in-depth analysis or empirical comparison of when one approach is preferred over the other? The analysis should consider factors beyond just efficiency versus accuracy, such as data freshness or the semantic completeness of the resolved variables.
4. Beyond the objective metrics of coverage and efficiency, what qualitative measures or user studies were conducted to assess the "quality”?
5. In my opinion there should be a user study to measure user satisfaction of the selective summaries generated by the proposed approach, particularly regarding their usefulness for exploration and understanding.
6. How sensitive is iSummary's performance to noise, irrelevant, or malicious queries within the generic query logs, and what preprocessing or filtering steps, if any, are implemented to mitigate their impact on the generated summaries?
7. Still, my main concern remains regarding the clarity and depth of its usability assessment. The evaluation primarily focuses on quantitative metrics rather than a qualitative understanding of its practical utility. For which target group is the approach useful? For instance, new users might find the summaries useful for initial exploration; however, the paper does not concretely demonstrate this through user studies. On the other side, for experienced users who already know how to query the KG and understand its structure, the precise value of these workload-based summaries requires further elaboration and analysis. Under what specific usage scenarios, does it provide a tangible advantage over direct querying or other exploration tools?

Minor issues:
There are some English typos and some editorial issues. Please find a list below:

1. exploration [4], [5], -> exploration [4,5]
2. In this paper, we extend the previous work in many ways -> several or different ways
3. Definition 1: 1. A knowledge graph G=(V,E); -> state V and E
4. Page 4 line 25, 35 -> there are these little squares on the right of the row.
5. Page 9, line 44, again square
6. In the preliminaries section, use the same notation for KGs as in the algorithms. The notations in this section are not used in the paper, thus they can either be removed or adapted.
7. As such, we keep a random 20% of the queries for testing and we use the remaining for the training -> what ML are you using? Still what kind other algorithm you are using for testing and training is not clear.
8. λ/κk=(…) -> λ/κ where k=(…)
9. we compare with Random -> citation is needed
10. we present to methods -> we present two methods

Review #3
Anonymous submitted on 15/Jul/2025
Suggestion:
Reject
Review Comment:

SUMMARY
The paper presents an algorithm for summarizing information contained
in knowledge graphs, and defines a class of such summaries, the
"lambda/kappa-selective summaries". The summaries are computationally
intractable to compute, so the authors produce approximations. The
authors' experiments investigate the coverage of the approximate
summaries and their algorithm's running time.

There are a number of problems with the paper. The definition of the
summaries restricts them to be trees, which doesn't seem
representative of general data. It also seems to imply that a summary
must be a maximum-weight spanning tree of the knowledge graph, which
can't be what was intended. It seems that Algorithm 1 does not produce
summaries of the graph but, rather, it produces summaries of the
queries that users made of the graph, which is a different
thing. There is a lack of mathematical rigour in the theoretical parts
of the work (e.g, the argument that computation of the summaries is NP-hard
seems incorrect, and the argument that Algorithm 1 produces an
approximation is just an analogy to another algorithm that uses
similar techniques on a different input).

The experimental part is also weak. First, more experiments are
needed. The data in Figure 2 shows that the algorithms are already
about as good as they get when using only 10% of the data for
training, but they only consider values of 10% and up. You need to
investigate the range 0-10% to find out how much is actually
required. And I think that more comment is needed about this
phenomenon: you're summarizing based on query logs, but the quality of
the summary seems largely independent of how big the log is. Doesn't
that seem strange? To me, it suggests that there's either something
wrong with the algorithm or with the testing methodology, or perhaps
that the queries in the test set are so homogeneous that there's no
benefit in using lots of them. Second, the statistical analysis of the
experiments is weak: essentially, means are calculated but no standard
deviations, so those means can't be compared. No regression is
performed on the data in Figure 2 to determine if the lines are flat,
or if they're trending up or down. Third, there are no experiments
comparing the output of the approximation algorithm to the optimum.
The authors say that this is because the graphs they use are large and
it's not possible to compute the optimum summaries for them. That's
fine, but they should compare against the optimum on smaller graphs,
where the optimum can be computed.

Given these many issues, I recommend rejecting the paper.

DETAILED COMMENTS

p2, l5. You criticize [9] because it "still depend[s] on the KG for
summary computation." How can computing the summary _of a graph_ not
depend on that graph? If you claim that your algorithms do not depend
on the graph that is being summarized (and your running time bound on
p10 includes no terms relating to the size of the graph), then you
cannot be summarizing the graph. It seems that, instead, you're
summarizing the queries made on it, but that could be a very different
thing, as outlined above.

Definition 1. I assume that by "the smallest maximum-weight tree", you
mean that I should consider all subtrees of the knowledge graph, and
find the set of maximum weight trees, and then choose the tree from
that set with the least number of vertices. But that means that the
answer is always maximum-cost spanning tree (forest if the KG isn't
connected). The answer can't fail to be spanning because, if it
doesn't include some vertex, adding an edge to that vertex can only
increase the tree's weight.

Also, why is the summary forced to be a tree? Why can't a summary of a
bibliographic database include the three edges "A is a coauthor of B",
"A works at university X" and "B works at university X"?

I don't think lambda/kappa-selective summary is good notation. The
slash looks like it denotes division, which it doesn't, and it has bad
side effects like
* A 3/4-summary sounds like it's bigger than a 3/5 summary, but it's
actually smaller
* things get awkward if the parameters are, themselves, arithmetic
expressions, and you'd have to bracket, e.g., an (x/2)/y summary,
where now one of the slashes means division and the other doesn't.
I suggest that (lambda,kappa)-selective summary is better notation:
it's just a pair of numbers, and that's the normal notation for a pair
of numbers.

Theorem 1 needs a full proof. It may well be fine, but the details
need to be written out. The issue is when you switch between a
minimization problem (Steiner trees) and a maximization problem
(summary computation) by inverting the weights. First, there's the
minor problem that, if all the vertices in your graph have the same
weight, then normalizing the weights to the interval [0,1] and
subtracting from 1 means that every vertex has weight zero in the
resulting graph, which can't be right. So let's say instead that we
normalize to [0,1] and then use weight 2-(normalized weight).

Now, this reduction doesn't turn Dijkstra's algorithm into an
algorithm for computing maximum-weight paths between vertices. This is
because Dijkstra's algorithm with inverted weights is still trying to
find a path with a small number of edges, whereas a maximum-weight
path will have a large number of edges. For example, consider the
graph with edges AB, AC and BC, and all vertices have weight 1. The
longest path from A to C is ABC with weight 3. But a shortest path in
the graph made by the reduction is AC with weight 2, so Dijkstra has
given the wrong answer.

Is the assumption (p5, l11) that "the most useful related nodes will
be the ones that appear more frequently" in queries really valid? What
if there's a high-volume source of unusual queries? For example,
suppose you have a KG about actors and movies and a gossip magazine
makes a huge number of queries about who's dating whom, because they
want to be able to report to their readers as soon as the graph
changes. Won't your algorithms now think that the most important thing
about an actor is who they're dating, rather than what movies they've
been in?

In Example 3.2, you claim that a certain summary is 1/2-selective, but
you don't establish the necessary criteria for being a 1/2-selective
summary.

Section 3.4 and subsequently repeatedly claim that Algorithm 1
produces a lambda/kappa selective summary. It doesn't: by your
argument, it produces an _approximation_ of one.

Theorem 3 needs a proper proof. It's not enough to just say that
you've used the same techniques as CHINS so you get the same
approximation ratio. That's an argument by analogy, not mathematics. A
specific problem here is that Algorithm 1 works on the queries that
users happened to run on the graph, and I don't see how that can
possibly guarantee an approximate summary of the underlying graph. To
take a simple example, suppose that the only queries made of the graph
in Figure 1 are to repeatedly ask "Tell me all the professors": how
can you possibly compute a summary of Vassiliou from those queries?
Also, k isn't defined in the theorem statement.

p10, l40 "We repeat the experiment 10 times (10-fold cross-
validation)" and a similar statement at p12, l39. Those are two
different things -- which one did you do? 10-fold cross-validation
would seem to contradict using 20% of the data as testing. 10-fold
cross-validation means randomly partitioning the data into parts 1-10
and then performing 10 experiments: the i-th experiment uses part i as
the testing set and the other parts as the training set. Thus, 10-fold
cross-validation necessarily means using 10% of the data as the
testing set, not 20%.

Figure 2 shows that the coverage is roughly constant as the training
set ranges from 10-80% of the data. In other words, the algorithm is
already as good as it gets when using 10% of the data for training.
This means that the interesting part of the graph is between 0 and
10%, and experiments therefore need to be done in that range. As noted
above, I think you need to explain this: it seems very strange to me
that a summary method based on query logs doesn't show a clear trend
of getting better as you give it bigger logs. It doesn't even look
like a diminishing returns situation. I find that very worrying.

The statistical analysis in Section 4 is very weak: it's based only on
the mean. At the very least, you need to calculate standard
deviations, so we can see if the differences between the means are
significant -- e.g., to answer questions like "does k=12 actually give
better coverage than k=6?" in Figure 2, and "Does method X actually
outperform method Y?" in Figure 4. I mentioned above that coverage
looks roughly constant in Figure 2: I suggest doing a regression
analysis to see what is the actual correlation between coverage and
training set size.

In Section 4.2, you say that computing optimal summaries for your big
graphs is infeasible because it takes too long. That's completely
fair, but why can't you compare against optimal summaries for smaller
graphs, where you can compute the optimum?

When I read in Section 4.3 that you were using random selection as a
baseline, my gut reaction is that it looks like a strawman. I was very
surprised, then, to see in Figure 4 that random selection outperforms
both GLIMPSE and PRR. Are you _sure_ about that? That's a very
surprising result which I think needs significant explanation.

At p12, l30, you say that your results are "[stable] among different
datasets". This is overstating your work -- anybody reading just that
sentence would infer that you've tested on several datasets. You mean
"stable across two different datasets", which is the smallest possible
improvement over just using one.

As far as I can see, references [26]-[72] (two thirds of the
bibliography!) are not actually cited in the paper. If they're not
cited, they shouldn't be there.

MINOR MISTAKES, etc.

p4, l3 says it's impractical to ask the user to weight even some of
the nodes; but line 16 says that the user adds weights to a subset
of the nodes.

p4, l17 "not unique" should be "not necessarily unique".

p4, l28 "more relevant information" is ambiguous: unclear whether it
means "a greater amount of relevant information" or "information
of greater relevance".

p6, l1; p8, l44
You say "random"/"randomly" but I think you mean "arbitrary"/
"arbitrarily". If you really mean "random", you need to state what
probability distribution is being used, and take that into account
in your analysis.

p7, l6 and l27. Is it deliberate that both of these examples include
the triple "Vassiliou, a, Person" twice?

p7, l41 You should explicitly say that lambda=|S|

p9, l49 Not sure what you mean by saying that parts of the graph are
connected "gradually". "Gradually" means to an increasing degree
over time, but things are either connected to a graph or they're
not. Do you mean "one at a time"?

p10, l10 "increasing by O(log T)" -- additively or multiplicatively?

p10, l30 You should say what JVM was used, as well as the hardware.

p11, l1 "provide answers to bigger and more query fragments" -- I
don't understand what you mean

p11, l22 and subsequently. "train queries" should be "training
queries"

TYPOS, etc.
No need to address these points in your response letter.

I assume the authors are not native English speakers. Your English is
excellent. Some of the points below are language corrections, but
please take them as tips to make your English even better, and not
criticism. I'm very grateful for my privileged position in which
people from other countries learn my language to communicate with me.

Capitalization: terms such as "knowledge graph" and "selective
summary" shouldn't be capitalized -- they're just ordinary nouns.
"Steiner Tree" should be "Steiner tree".

p1, l43 There seem to be extra spaces before [3] and [4].

p2, l14 "weights assignments" should be "weight assignments" (or
"assignments of weights")

p3, l1 "allowing to [verb]" is a common mistake. "Allow[ing]" in
English is always followed either by a noun (e.g., simple cases
like "The referee allowed the goal" or more complex cases with a
gerundive, such as "Allowing the support of unknown URI/literal
tokens") or by a subject-infinitive-object combination (e.g.,
"allowing it to support unknown URI/literal tokens").

p3, l4 "the SPARQL queries" --> "SPARQL queries" (you're referring to
SPARQL queries in general, rather than a specific group of them,
so no "the")

p3, l5 There's an extra ) after U union X

p3, l5 No need to say "BGP" again, as you already introduced this
abbreviation on the previous line.

p3, l10 "size of the summary (in terms of nodes to be included)" is a
long-winded way of saying "number of nodes to be included in the
summary". If you want to introduce the term "size", then I suggest
replacing the parenthesis with "(number of nodes to be included)"

p3, l12 and elsewhere. The typesetting of G' is odd. It looks like
maybe you wrote $G\prime$ in the LaTeX source, but you need to
superscript the prime. Even easier, just write $G'$ in the source
and it'll typeset correctly.

Figure 1
I'm not sure what you mean by saying that "Institute" is a
subclass of "Publication" -- an institute is an organization, so I
think you must mean something else, there. "affiliatedOf" should
be "affiliatedWith" or "affiliatedTo". And the edge should be in
the opposite direction: one typically talks of a person being
affiliated with an organization, rather than the other way round.
But changing the direction of that edge might mess up your
examples, so maybe change it from "affiliatedOf" to "employs" or
something like that? (Maybe it's not technically correct to say
that a grad student is an employee -- they are in Germany, I
believe! -- but I don't think that's a big issue.) The arrow for
the edge Kondylakis->Professor coincides with the tail of the edge
Professor->Faculty, making it look like a bidirectional edge.

Example 2.1
"Example 2.1. Consider as an example, the KG shown in Figure 1,
which includes..." is a lot of words to say "Example 2.1. The KG
in Figure 1 includes..." No need to say that Example 2.1 is an
example, no need to tell me to look at it, etc.

p4, l3 I'd say that the 62 billion triples in the Linked Open Data
Cloud is a much better indication that we can't expect users to
weight all the nodes! :-)

p4, l23 "set something to be equal to zero" is just three extra words
for "set something to zero".

p4, l27 "as the kappa increases" --> "as kappa increases"

p4, l39 (and elsewhere) We don't normally write explicit
multiplication signs in mathematical expressions, and "log" should
be "\log" in the LaTeX source.

p4, l50 and elsewhere, you describe your own work as "elegant". That's
for others to judge.

p5, l3 The three dots should be at the level of the commas. Use \dots
to get dots at the right level (e.g., at the bottom between
commas, but in the middle between +'s).

p5, l50 In most cases, "in order to" is just two extra words to say
"to", so style guides like Strunk and White recommend not using
it. Here, though, it's wrong. Saying "we provide a procedure in
order to link nodes" suggests that "to link nodes" is related to
"we provide", rather than to "procedure".

p5, l51 "We will stick to the ideas of..." is too informal. I think
you mean something like "We use ideas from the CHINS
approximation".

p6, l1 "adding one additional node" is redundant -- anything that's
added is additional.

p6, l11 "shortest path for linking X and Y" --> "shortest path linking
X and Y"

p6, l31 "try and search" is common in informal writing but "try to
search" is better. "We will try and search" literally means "We will
try [some unspecified thing], and also we will search", whereas
you really mean that the thing you'll try is searching.

p7, l41 and l42 there's an extra ) after (kappa-lambda)/lambda (twice)

p7, l45 "nodes-visited" shouldn't be typeset as if it's "nodes minus
visited".

p8, l49 "as, in the case of ties, these are broken" is a long-winded
way of saying "as ties are broken"

p9, l38 and subsequently. It's better to use $\ell$ instead of $l$ --
much easier to read as a lower-case L, rather than a 1.

p10, l38 "like... and more" is redundant: "like" already means you're
not listing everything, so you don't need to say again that the
list isn't complete. Also, it's usually better to use "such as" to
introduce examples, unless you want the specific meaning that the
other examples resemble the ones you've listed. Indeed, saying
"like" doesn't even make it clear that the listed things _are_
examples. For example, "I have visited cities such as Paris" means
that I've been to Paris and other places; "I have visited cities
like Paris" means I've been to places that resemble Paris, but may
or may not include Paris.

p11, l6 delete "i.e.". That's used to present explanations (i.e., to
clarify points, like this) but, here, you're trying to use it to
present the thing that's being explained.

p11, l14, l39 etc. There are several places where you typeset an
expression like "k=5" in text mode -- they should be in math mode.

p14, l19 "Queiroz-Sousaet al. [7]" -- missing space before "et" but,
more importantly, [7] is by Alzogbi and Lausen, and I don't think
you cite any papers by Queiroz-sousa.

Review #4
Anonymous submitted on 18/Jul/2025
Suggestion:
Accept
Review Comment:

Disregard this review, it was entered for testing purposes only.

Review #5
Anonymous submitted on 18/Jul/2025
Suggestion:
Minor Revision
Review Comment:

Summary & Strengths:

The paper addresses an important problem in knowledge graph exploration, specifically how to generate concise summaries tailored to a user's interests. By combining query-log-driven analysis with selective summary construction, the approach demonstrates novelty in a manner not previously observed in prior work. The paper's contributions are well-supported by a thorough theoretical analysis, which includes formal definitions, proofs, and quality lemmas, as well as extensive experiments conducted on real-world data. The results show significant improvements over baseline methods in both summary quality and runtime performance, highlighting the importance of this work. The manuscript is generally well-written and organized, making it easy to follow the reasoning and reproduce the methodology. The scalability achieved by the approach is especially impressive, demonstrating that useful summaries can be constructed from a small fraction of query logs. These results are an excellent takeaway for practitioners. The provided artifact (GitHub repository) is also a positive aspect, as it enables verification of the claims (at least on the DBpedia dataset) and reflects a commitment to open science.

Weaknesses & Things to improve:
- A significant amount of text is copied and pasted from the ESWC paper. While reusing text from one's own earlier work is permissible in journal extensions, best practices are to (1) Minimize verbatim reuse—rewrite background and method descriptions so the journal article stands on its own and reflects the new insights gained since the conference version. (2) Explicitly mark the reused portions or at least summarize them more concisely to avoid the impression of self-plagiarism. (3) Provide fresh illustrative material—even updating the example graph or adding a second, more complex example would help demonstrate the generalization to multiple seed nodes and reduce déjà-vu for readers who know the ESWC paper.
- Clarification needed on KG formalization: In Section 2.1, the authors define an RDF Knowledge Graph in the standard triple-centric way, i.e., a set of RDF triples (s,p,o) with separate domains for U, B and L. Just a page later, in Section 2.3 (Definition 1) the authors switch to a graph-theoretic abstraction G=(V,E) without explaining how the earlier RDF model is mapped to this vertex/edge view. The coexistence of these two incompatible formalizations (triples vs. V, E) may confuse readers. Besides, the paper does not utilize the richer RDF machinery (blank nodes, literals, etc.) in the subsequent proofs. Choose one canonical representation and adhere to it consistently throughout, or provide an explicit transformation.
- Another point is the evaluation scope: all quantitative experiments focus on the coverage metric. While coverage (i.e., how well the summary answers queries) is a sensible measure, an additional evaluation of summary quality (perhaps via a user study or qualitative examples) could strengthen the paper. For instance, do the summaries produced make intuitive sense to end-users? Are they easy to interpret? The paper hints at these aspects but does not directly evaluate them.
- Additionally, the method currently guarantees a tree-structured summary connecting the seeds. In some cases, important information might not fit neatly into a tree (there could be relevant cross-connections or cycles); the choice of a tree could be a limiting assumption (though it is understandable given the NP-hardness of more general subgraph selection).
- Regarding the artifact, the omission of the Wikidata query log and baseline scripts means that a reader cannot fully replicate every experiment without additional effort. It would be nice if the authors could include instructions or data for the Wikidata portion to complete the replication package.
- The GitHub repository contains complementary artifacts for replicating the work; unfortunately, the commit comments do not adhere to "clean code" principles, making it difficult to assess the approach.
- Finally, the writing could be polished to fix minor grammatical issues and to clarify a few algorithmic details (for example, explicitly stating how ties or fractional node allocations are handled in Algorithm 1).

Review #6
Anonymous submitted on 23/Jul/2025
Suggestion:
Minor Revision
Review Comment:

The paper proposes a method for generating dynamic summaries in knowledge graphs by identifying the most important elements based on previously logged queries, which is a relevant and timely problem. However, the paper would benefit from a concrete, practical application scenario that clearly illustrates how this summarization method could be used in a real-world setting. The paper includes experiments on multiple datasets, which is a strength. The comparison with another summarization method also strengthens the contribution. The "Long-term stable URL for resources" is on Github and appears to be valid.

On the other hand, several clarifications and structural improvements are necessary. It would be helpful to provide a more detailed explanation—perhaps using an additional dataset—of how the number of triple patterns relates to coverage. The definition of key concepts, especially what the authors consider a "maximum weight tree," should be introduced. Additionally, references to colors in figures or examples are not useful for printed versions and should be reconsidered. The paper should also explain whether and in what situations the weights wa​ and wp​ used in the coverage metric are expected to differ. Finally, the authors should discuss how the method performs on smaller graphs: Would the same number of logged queries be sufficient, or would the approach need to be adapted for graphs of significantly smaller size?