Review Comment:
SUMMARY
The paper presents an algorithm for summarizing information contained
in knowledge graphs, and defines a class of such summaries, the
"lambda/kappa-selective summaries". The summaries are computationally
intractable to compute, so the authors produce approximations. The
authors' experiments investigate the coverage of the approximate
summaries and their algorithm's running time.
There are a number of problems with the paper. The definition of the
summaries restricts them to be trees, which doesn't seem
representative of general data. It also seems to imply that a summary
must be a maximum-weight spanning tree of the knowledge graph, which
can't be what was intended. It seems that Algorithm 1 does not produce
summaries of the graph but, rather, it produces summaries of the
queries that users made of the graph, which is a different
thing. There is a lack of mathematical rigour in the theoretical parts
of the work (e.g, the argument that computation of the summaries is NP-hard
seems incorrect, and the argument that Algorithm 1 produces an
approximation is just an analogy to another algorithm that uses
similar techniques on a different input).
The experimental part is also weak. First, more experiments are
needed. The data in Figure 2 shows that the algorithms are already
about as good as they get when using only 10% of the data for
training, but they only consider values of 10% and up. You need to
investigate the range 0-10% to find out how much is actually
required. And I think that more comment is needed about this
phenomenon: you're summarizing based on query logs, but the quality of
the summary seems largely independent of how big the log is. Doesn't
that seem strange? To me, it suggests that there's either something
wrong with the algorithm or with the testing methodology, or perhaps
that the queries in the test set are so homogeneous that there's no
benefit in using lots of them. Second, the statistical analysis of the
experiments is weak: essentially, means are calculated but no standard
deviations, so those means can't be compared. No regression is
performed on the data in Figure 2 to determine if the lines are flat,
or if they're trending up or down. Third, there are no experiments
comparing the output of the approximation algorithm to the optimum.
The authors say that this is because the graphs they use are large and
it's not possible to compute the optimum summaries for them. That's
fine, but they should compare against the optimum on smaller graphs,
where the optimum can be computed.
Given these many issues, I recommend rejecting the paper.
DETAILED COMMENTS
p2, l5. You criticize [9] because it "still depend[s] on the KG for
summary computation." How can computing the summary _of a graph_ not
depend on that graph? If you claim that your algorithms do not depend
on the graph that is being summarized (and your running time bound on
p10 includes no terms relating to the size of the graph), then you
cannot be summarizing the graph. It seems that, instead, you're
summarizing the queries made on it, but that could be a very different
thing, as outlined above.
Definition 1. I assume that by "the smallest maximum-weight tree", you
mean that I should consider all subtrees of the knowledge graph, and
find the set of maximum weight trees, and then choose the tree from
that set with the least number of vertices. But that means that the
answer is always maximum-cost spanning tree (forest if the KG isn't
connected). The answer can't fail to be spanning because, if it
doesn't include some vertex, adding an edge to that vertex can only
increase the tree's weight.
Also, why is the summary forced to be a tree? Why can't a summary of a
bibliographic database include the three edges "A is a coauthor of B",
"A works at university X" and "B works at university X"?
I don't think lambda/kappa-selective summary is good notation. The
slash looks like it denotes division, which it doesn't, and it has bad
side effects like
* A 3/4-summary sounds like it's bigger than a 3/5 summary, but it's
actually smaller
* things get awkward if the parameters are, themselves, arithmetic
expressions, and you'd have to bracket, e.g., an (x/2)/y summary,
where now one of the slashes means division and the other doesn't.
I suggest that (lambda,kappa)-selective summary is better notation:
it's just a pair of numbers, and that's the normal notation for a pair
of numbers.
Theorem 1 needs a full proof. It may well be fine, but the details
need to be written out. The issue is when you switch between a
minimization problem (Steiner trees) and a maximization problem
(summary computation) by inverting the weights. First, there's the
minor problem that, if all the vertices in your graph have the same
weight, then normalizing the weights to the interval [0,1] and
subtracting from 1 means that every vertex has weight zero in the
resulting graph, which can't be right. So let's say instead that we
normalize to [0,1] and then use weight 2-(normalized weight).
Now, this reduction doesn't turn Dijkstra's algorithm into an
algorithm for computing maximum-weight paths between vertices. This is
because Dijkstra's algorithm with inverted weights is still trying to
find a path with a small number of edges, whereas a maximum-weight
path will have a large number of edges. For example, consider the
graph with edges AB, AC and BC, and all vertices have weight 1. The
longest path from A to C is ABC with weight 3. But a shortest path in
the graph made by the reduction is AC with weight 2, so Dijkstra has
given the wrong answer.
Is the assumption (p5, l11) that "the most useful related nodes will
be the ones that appear more frequently" in queries really valid? What
if there's a high-volume source of unusual queries? For example,
suppose you have a KG about actors and movies and a gossip magazine
makes a huge number of queries about who's dating whom, because they
want to be able to report to their readers as soon as the graph
changes. Won't your algorithms now think that the most important thing
about an actor is who they're dating, rather than what movies they've
been in?
In Example 3.2, you claim that a certain summary is 1/2-selective, but
you don't establish the necessary criteria for being a 1/2-selective
summary.
Section 3.4 and subsequently repeatedly claim that Algorithm 1
produces a lambda/kappa selective summary. It doesn't: by your
argument, it produces an _approximation_ of one.
Theorem 3 needs a proper proof. It's not enough to just say that
you've used the same techniques as CHINS so you get the same
approximation ratio. That's an argument by analogy, not mathematics. A
specific problem here is that Algorithm 1 works on the queries that
users happened to run on the graph, and I don't see how that can
possibly guarantee an approximate summary of the underlying graph. To
take a simple example, suppose that the only queries made of the graph
in Figure 1 are to repeatedly ask "Tell me all the professors": how
can you possibly compute a summary of Vassiliou from those queries?
Also, k isn't defined in the theorem statement.
p10, l40 "We repeat the experiment 10 times (10-fold cross-
validation)" and a similar statement at p12, l39. Those are two
different things -- which one did you do? 10-fold cross-validation
would seem to contradict using 20% of the data as testing. 10-fold
cross-validation means randomly partitioning the data into parts 1-10
and then performing 10 experiments: the i-th experiment uses part i as
the testing set and the other parts as the training set. Thus, 10-fold
cross-validation necessarily means using 10% of the data as the
testing set, not 20%.
Figure 2 shows that the coverage is roughly constant as the training
set ranges from 10-80% of the data. In other words, the algorithm is
already as good as it gets when using 10% of the data for training.
This means that the interesting part of the graph is between 0 and
10%, and experiments therefore need to be done in that range. As noted
above, I think you need to explain this: it seems very strange to me
that a summary method based on query logs doesn't show a clear trend
of getting better as you give it bigger logs. It doesn't even look
like a diminishing returns situation. I find that very worrying.
The statistical analysis in Section 4 is very weak: it's based only on
the mean. At the very least, you need to calculate standard
deviations, so we can see if the differences between the means are
significant -- e.g., to answer questions like "does k=12 actually give
better coverage than k=6?" in Figure 2, and "Does method X actually
outperform method Y?" in Figure 4. I mentioned above that coverage
looks roughly constant in Figure 2: I suggest doing a regression
analysis to see what is the actual correlation between coverage and
training set size.
In Section 4.2, you say that computing optimal summaries for your big
graphs is infeasible because it takes too long. That's completely
fair, but why can't you compare against optimal summaries for smaller
graphs, where you can compute the optimum?
When I read in Section 4.3 that you were using random selection as a
baseline, my gut reaction is that it looks like a strawman. I was very
surprised, then, to see in Figure 4 that random selection outperforms
both GLIMPSE and PRR. Are you _sure_ about that? That's a very
surprising result which I think needs significant explanation.
At p12, l30, you say that your results are "[stable] among different
datasets". This is overstating your work -- anybody reading just that
sentence would infer that you've tested on several datasets. You mean
"stable across two different datasets", which is the smallest possible
improvement over just using one.
As far as I can see, references [26]-[72] (two thirds of the
bibliography!) are not actually cited in the paper. If they're not
cited, they shouldn't be there.
MINOR MISTAKES, etc.
p4, l3 says it's impractical to ask the user to weight even some of
the nodes; but line 16 says that the user adds weights to a subset
of the nodes.
p4, l17 "not unique" should be "not necessarily unique".
p4, l28 "more relevant information" is ambiguous: unclear whether it
means "a greater amount of relevant information" or "information
of greater relevance".
p6, l1; p8, l44
You say "random"/"randomly" but I think you mean "arbitrary"/
"arbitrarily". If you really mean "random", you need to state what
probability distribution is being used, and take that into account
in your analysis.
p7, l6 and l27. Is it deliberate that both of these examples include
the triple "Vassiliou, a, Person" twice?
p7, l41 You should explicitly say that lambda=|S|
p9, l49 Not sure what you mean by saying that parts of the graph are
connected "gradually". "Gradually" means to an increasing degree
over time, but things are either connected to a graph or they're
not. Do you mean "one at a time"?
p10, l10 "increasing by O(log T)" -- additively or multiplicatively?
p10, l30 You should say what JVM was used, as well as the hardware.
p11, l1 "provide answers to bigger and more query fragments" -- I
don't understand what you mean
p11, l22 and subsequently. "train queries" should be "training
queries"
TYPOS, etc.
No need to address these points in your response letter.
I assume the authors are not native English speakers. Your English is
excellent. Some of the points below are language corrections, but
please take them as tips to make your English even better, and not
criticism. I'm very grateful for my privileged position in which
people from other countries learn my language to communicate with me.
Capitalization: terms such as "knowledge graph" and "selective
summary" shouldn't be capitalized -- they're just ordinary nouns.
"Steiner Tree" should be "Steiner tree".
p1, l43 There seem to be extra spaces before [3] and [4].
p2, l14 "weights assignments" should be "weight assignments" (or
"assignments of weights")
p3, l1 "allowing to [verb]" is a common mistake. "Allow[ing]" in
English is always followed either by a noun (e.g., simple cases
like "The referee allowed the goal" or more complex cases with a
gerundive, such as "Allowing the support of unknown URI/literal
tokens") or by a subject-infinitive-object combination (e.g.,
"allowing it to support unknown URI/literal tokens").
p3, l4 "the SPARQL queries" --> "SPARQL queries" (you're referring to
SPARQL queries in general, rather than a specific group of them,
so no "the")
p3, l5 There's an extra ) after U union X
p3, l5 No need to say "BGP" again, as you already introduced this
abbreviation on the previous line.
p3, l10 "size of the summary (in terms of nodes to be included)" is a
long-winded way of saying "number of nodes to be included in the
summary". If you want to introduce the term "size", then I suggest
replacing the parenthesis with "(number of nodes to be included)"
p3, l12 and elsewhere. The typesetting of G' is odd. It looks like
maybe you wrote $G\prime$ in the LaTeX source, but you need to
superscript the prime. Even easier, just write $G'$ in the source
and it'll typeset correctly.
Figure 1
I'm not sure what you mean by saying that "Institute" is a
subclass of "Publication" -- an institute is an organization, so I
think you must mean something else, there. "affiliatedOf" should
be "affiliatedWith" or "affiliatedTo". And the edge should be in
the opposite direction: one typically talks of a person being
affiliated with an organization, rather than the other way round.
But changing the direction of that edge might mess up your
examples, so maybe change it from "affiliatedOf" to "employs" or
something like that? (Maybe it's not technically correct to say
that a grad student is an employee -- they are in Germany, I
believe! -- but I don't think that's a big issue.) The arrow for
the edge Kondylakis->Professor coincides with the tail of the edge
Professor->Faculty, making it look like a bidirectional edge.
Example 2.1
"Example 2.1. Consider as an example, the KG shown in Figure 1,
which includes..." is a lot of words to say "Example 2.1. The KG
in Figure 1 includes..." No need to say that Example 2.1 is an
example, no need to tell me to look at it, etc.
p4, l3 I'd say that the 62 billion triples in the Linked Open Data
Cloud is a much better indication that we can't expect users to
weight all the nodes! :-)
p4, l23 "set something to be equal to zero" is just three extra words
for "set something to zero".
p4, l27 "as the kappa increases" --> "as kappa increases"
p4, l39 (and elsewhere) We don't normally write explicit
multiplication signs in mathematical expressions, and "log" should
be "\log" in the LaTeX source.
p4, l50 and elsewhere, you describe your own work as "elegant". That's
for others to judge.
p5, l3 The three dots should be at the level of the commas. Use \dots
to get dots at the right level (e.g., at the bottom between
commas, but in the middle between +'s).
p5, l50 In most cases, "in order to" is just two extra words to say
"to", so style guides like Strunk and White recommend not using
it. Here, though, it's wrong. Saying "we provide a procedure in
order to link nodes" suggests that "to link nodes" is related to
"we provide", rather than to "procedure".
p5, l51 "We will stick to the ideas of..." is too informal. I think
you mean something like "We use ideas from the CHINS
approximation".
p6, l1 "adding one additional node" is redundant -- anything that's
added is additional.
p6, l11 "shortest path for linking X and Y" --> "shortest path linking
X and Y"
p6, l31 "try and search" is common in informal writing but "try to
search" is better. "We will try and search" literally means "We will
try [some unspecified thing], and also we will search", whereas
you really mean that the thing you'll try is searching.
p7, l41 and l42 there's an extra ) after (kappa-lambda)/lambda (twice)
p7, l45 "nodes-visited" shouldn't be typeset as if it's "nodes minus
visited".
p8, l49 "as, in the case of ties, these are broken" is a long-winded
way of saying "as ties are broken"
p9, l38 and subsequently. It's better to use $\ell$ instead of $l$ --
much easier to read as a lower-case L, rather than a 1.
p10, l38 "like... and more" is redundant: "like" already means you're
not listing everything, so you don't need to say again that the
list isn't complete. Also, it's usually better to use "such as" to
introduce examples, unless you want the specific meaning that the
other examples resemble the ones you've listed. Indeed, saying
"like" doesn't even make it clear that the listed things _are_
examples. For example, "I have visited cities such as Paris" means
that I've been to Paris and other places; "I have visited cities
like Paris" means I've been to places that resemble Paris, but may
or may not include Paris.
p11, l6 delete "i.e.". That's used to present explanations (i.e., to
clarify points, like this) but, here, you're trying to use it to
present the thing that's being explained.
p11, l14, l39 etc. There are several places where you typeset an
expression like "k=5" in text mode -- they should be in math mode.
p14, l19 "Queiroz-Sousaet al. [7]" -- missing space before "et" but,
more importantly, [7] is by Alzogbi and Lausen, and I don't think
you cite any papers by Queiroz-sousa.
|