Review Comment:
The paper is engaging and well-written.
It provides a thorough theoretical analysis, implementation, and experiments, making the related artefacts publicly available.
Moreover, the described research results may be useful in practical systems in various applications.
I have some remarks, though, mainly dealing with the following:
1) novelty concerning the paper published at ISWC (reference [24]),
2) concerning the last release of the code,
3) for making the paper more understandable for the broader audience.
*** Novelty ***
Please provide a discussion with a clear distinction of the novelty of the current paper with respect to [24].
For instance, Table 3 is a copy of Table 1 from [24], etc.
*** Source code ***
Page 14: Evaluation
The code is from 3-4 years ago. Is that correct?
*** Other remarks ***
Page 6, line 5: Also grid-based, distribution-based, hierarchical clustering.
Page 7, line 9: Range-based semantics at this point needs to be clarified.
Page 8, Definition 4.7: I recommend adding information on the meaning of the r symbol to make the definition more self-contained and understandable.
Page 8, Definition 4.8: What is r'? (It is only explained on the next page) What is k? Is k related to k-nn?
The last line on page 8 needs to be clarified. What does it mean that k' may equal infinity?
Why does one need the term r' referring to the distance required to return at least k nearest neighbours? In the original k-nn formulation, there is no need for the such an additional parameter.
Page 15, Fig. 8: The country codes are barely visible. When printed in black-white, the figure does not show clusters, only (unclustered) points. I would use additional means to distinguish clusters (besides colors).
Page 18, subsection 6.1.2: from what I have understood the authors compare the running times of their method with the running times of DBSImJoin that was run on different hardware. It can make sense when comparing implementations rather than methods, but testing on the same hardware would be better. However, I can understand the challenges concerning reproducibility. But why then, are the results copied from [24]?
Page 23, Proofs:
To make the paper more comprehensive for a broader set of readers, I recommend the following:
1) To provide some examples illustrating the proofs.
2) Discuss why such algebraic properties as commutative, distributive, etc., are essential and for what purposes. Please explain here or in Section 4.2 (Semantics).
*** Further remarks ***
Check the pronunciations of all names of the authors as there are typos.
Perhaps number equations.
Page 11, Fig. 3: The example might be received as provocative. I would avoid such examples in already tense times. But I leave the decision to keep the example for the authors.
|