Review Comment:
With their submission "A simple method for inducing class taxonomies in knowledge graphs" the authors apply methods from the field of tag taxonomy induction to the problem of constructing a class hierarchy from an RDF knowledge graph, given type assertions of resources. The main content was published already in the proceedings of the European Semantic Web Conference 2020. The submission is a copy of the ESWC publication with the following additions:
1. Introduction: Three sentences
2. Related work: One subsection (2.3.) with two paragraphs covering methods for hierarchical clustering
4. Approach: One paragraph + One subsection (4.2.) with two paragraphs and one algorithm description about the hierarchical clustering procedure
5. Evaluation: One paragraph introducing additional measures (Doc-F1, Tag-F1) + one sub section (5.1.4.) explaining the IIMB dataset which was not used for evaluation in the ESWC publication + one paragraph discussing results for Doc-F1 and Tag-F1 + one subsection presenting performance of the authors' approach w.r.t. Doc-F1 and Tag-F1 + one paragraph touching on the computational complexity of Doc-F1 and Tag-F1
6. Conclusions: Two sentences
In terms of overall content the additions amount roughly to 1.5 pages of the total 13 pages. (Just in case this is of interest -- I could not find any information/requirements for this ESWC submission call.)
The overall topic of inducing terminological knowledge from RDF data to me seems relevant, especially in case of crowdsourced/user generated data which usually mainly comprises assertional knowledge. In terms of the newly introduced content regarding the hierarchical clustering considerations of RDF resources based on their types and the induced class taxonomy, IMO the authors could not convincingly motivate why this was of interest and what applications of this might be. It would be good if they could further provide some discussion and intuition when and why it makes sense to distinguish between the induced class hierarchy and its underlying hierarchical clustering ``with strong inheritance properties'' by means of this class hierarchy. Regarding the newly introduced F1 scores, Doc-F1 and Tag-F1, I would also appreciate some more details on what they actually measure and why it makes sense to consider them in the evaluation of an induced class taxonomy. My intuition for Doc-F1, for example, would be that whenever resources are `tagged' with the most specific class and only one more general class in the hierarchy, this would always lead to imperfect Doc-F1 scores (given that the class hierarchy depth is greater 2 and there are no empty classes that were pruned out). So this would imply that this Doc-F1 score highly depends on the input data and is not a good measure for the induced class taxonomy/hierarchical clustering. Especially in case this intuition is wrong I would like to ask the authors to provide a more in depth discussion on the Doc-F1 and Tag-F1 scores.
The overall approach of using a resource's class as a tag and applying tag taxonomy induction methods is easy to understand and well described in the submission. However, since I did not follow the most recent developments in the field of hierarchical clustering and taxonomy induction on tags and folksonomies I cannot reliably judge the originality of the submission. This also concerns the overview of related work which did not cover more recent publications on the topic.
I want to express two notational concerns that I found confusing:
(P1) p.4, l.11, l.13, l.16: s, s_i, d_i
As far as I understand s, s_i, as well as d_i refer to the same thing, i.e. resources on the subject position of a triple that have a type assigned (via rdf:type). If I'm not mistaken, s and s_i are never used again in the submission and I think it would be less confusing if they are dropped from the problem description.
(P2) p.4, l.39: ``the subsumption axioms {dbo:Person --> dbo:Artist} and {dbo:Artist --> dbo:Painter} imply that dbo:Painter is a dbo:Artist and that dbo:Artist is a dbo:Person''
To me this notation of {superclass --> subclass} is counter intuitive, and should be the other way around, i.e. {subclass --> superclass}. From a logical perspective, reading this as an implication would make more sense then. Furthermore, graphical notations of RDF graphs or UML also use arrows pointing to the superclasses, not to the subclasses. Same holds for the notation on p.5, l.31.
In terms of the correctness of the submission I have a few concerns detailed below:
(C1) p.1, l.22: ``we propose a simple method for inducing class taxonomies from knowledge graphs that is scalable to large datasets''; p.2, l.36: ``we construct a novel approach to inducing class taxonomies which outperforms existing tag hierarchy induction methods both in terms [of] scalability and quality of induced taxonomies''
I don't see the scalability issue discussed in the submission. Experimental results are only given for the F1 score quality measures. Further on p.9, l.19, it is said that the approach of ``Heymann and Garcia-Molina was not able to terminate sufficiently fast enough for us to obtain results'' without providing any numbers or discussion apart from the authors' judgement of being `not sufficiently fast enough'.
(C2) p.2, l.11: ``automated methods are not able to induce class taxonomies of the quality necessary to reliably apply to complex knowledge bases''
This statement seems to be far too general to me and comes without any discussion of which methods are meant and why they fail to induce class taxonomies. There is also no literature reference that would back this assertion.
(C3) p.3, l.37: ``Each of these rules has the relationship of premise and consequence which the authors treat as that of class and subclass.''
As far as I understood the paper referenced by [16] premises refer to subclasses and consequences to the superclasses, so `class' and `subclass' should be swapped in l.37.
The evaluation performed is divided into two parts: One part examining the quality of the induced taxonomies based on the respective gold standard taxonomies, and one touching on the quality of the hierarchical clustering outcomes. Whereas the authors compare their solution with methods from other publications in the former part, they only provide numbers for their own solution in the latter. This raises the question why this wasn't done also at least for the other methods re-implemented by the authors. The source code and datasets to re-run the first part of the evaluation are provided on GitHub. However, it is tied to the preprocessed datasets used in the evaluation and thus cannot be run on arbitrary RDF knowledge bases. I could not find source code for the second part of the evaluation.
Given the open points mentioned above I would like to ask the authors for feedback and to revise the respective parts of the submission if possible and considered meaningful.
Minor comments and typos:
p.2, l.36: ``both in terms scalability and quality'' --> `in terms of'
p.3, l.35: ``the Aprioir algorithm'' --> `Apriori algorithm'
p.3, l.47: ``that have a high frequencies'' --> `have a high frequency'/`have high frequencies'
p.4, l.3: ``A knowledge graph, K, is repository'' --> `is a repository'
p.5, eq.(1): D_b can be 0
p.6, alg.2: decay factor \alpha mentioned as input but not used in the algorithm
p.6, alg.2, l.5: ``for c_a \in T* do'' --> T* does not have clusters c_a as elements but subsumption axioms
p.6, l.45: ``tags from in the vocabulary'' --> `from the vocabulary'?
p.7, l.13: ``followed by a comparison our method'' --> `of our method'
p.7, l.23: There should be a comma between the footnotes 3 and 4
p.7, l.43ff (Just a note): Taking 'is-a' as the `type tag' isn't really in line with what was said in Sec.3. is-a rather translates to rdfs:subClassOf and it seems classes/concepts are clustered here instead of individuals having a `type tag'.
p.8, l.15: number formatting; a thousands delimiter was used in previous numbers
p.9, Table 1: Stdev. value for Heymann and Garcia-Molina/DBpedia differs from original ESWC publication (0.0149 vs. 0.0159)
p.12, l.25: O(|V^2|) vs. O(|V|^2)
|