Review Comment:
In their submission, 'Hierarchical Blockmodelling for Knowledge Graphs', the authors present a novel approach to capturing a knowledge graph's inherent structural properties and performing hierarchical clustering based on probabilistic graphical models.
I consider the topic relevant to the scope of the Semantic Web Journal. Furthermore, I think the authors' work is a valuable scientific contribution. The submitted manuscript is well-written and self-contained. However, to be self-contained, quite a few mathematical foundations needed to be introduced in detail, which made the manuscript quite lengthy and technical.
In terms of correctness, the manuscript reads plausibly. The applied techniques are well-motivated, and related work is discussed. However, I am not a mathematician and, thus, cannot reliably verify the correctness of every detail in the more extensive formulas, like Eqn. 16 onwards. The scalability issues of the proposed approach are discussed, and the authors plan to investigate modifications to the presented approach to remedy them. To get a better idea of its practicability, it would be nice to have the overall runtimes and hardware specifications as part of the evaluation results.
The Matlab/C++ source code to reproduce the experiments is provided. However, I could not test it in depth due to the lack of a proper Matlab/C++ development environment.
Minor issues and typos:
- Throughout the paper, 'WikiData' is used (camel case) instead of 'Wikidata'
- p. 3, l. 3: "The learning process is then infer" -> 'to infer'
- p. 5, l. 35: "Henry Ford's occupation is an engineer" -> 'is being an engineer'
- p. 6, l. 33: "the chance observing" -> 'chance of observing'
- p. 8, Caption Fig. 3: "dashed lines indicate indicate"
- p. 8, l. 25: "This principle become relevant" -> 'becomes'
- p. 8, l. 25/26: "when controlling for the branching factor" -> 'when controlling the branching factor'
- p. 9, 20: "hasn't" -> 'has not'
- p. 9, l. 25: "in the tree , the probability" -> remove space before comma
- p. 11, l. 19: "these community relations are modelled with respect to a predicate in the knowledge graph": I think this needs a bit more explanation at that part of the paper.
- p. 11, l. 21: "in order generate" -> 'in order to generate'
- p. 11, l. 41/42: "empty communities and removed" -> 'are removed'
- p. 12, l. 8: "it's" -> 'it is'
- p. 12, l. 41: "it's" -> 'it is'
- p. 16, l. 7: "Firtly" -> 'Firstly'
- p. 17, l. 50: "which a constant" -> 'which are constant'
- p. 18, l. 45: "it's" -> 'it is'
- p. 19, l. 44: "it's" -> 'it is'
- p. 21, l. 35: "Gamma forms the Beta function" -> 'of the Beta function'
- p. 21, l. 37: "it's" -> 'it is'
- p. 22, l. 15: "effect on model likelihood" -> 'on the model likelihood'
- p. 22, l. 26: "it necessary" -> 'it is necessary'
- p. 22, l. 51: should be a proper URL with https:// instead of the www.
- p. 25, l. 33: "it's" -> 'it is'
- p. 25, l. 43: "we note a dips": mix of singular and plural
- p. 26: Fig. 9 appears on p. 26 but is referenced on p. 28, after the references of Fig. 10 and Fig. 11; maybe reorder figures
- p. 27, l. 31: "isn't reflected" -> 'is not'
- p. 28, l. 32: "obtained by out method" -> 'our method'
- p. 28, l. 40: "there is no prior constraint on the this structure"
- p. 28, l. 46ff: maybe use \emph{} to highlight predicates and classes
- p. 28, l. 47: "advantage of our mour method"
- p. 28, l. 49: "Footballers" -> 'footballers'
- p. 28, l. 50: "than athlete" -> 'athletes'
- p. 28, l. 50: "Nations" -> 'nations'
- p. 29, l. 17: "presenting a novel and principled for qualitative": word(s) missing
- p. 29ff: Not all acronyms in capital letters in the references (Icml, Rdf, ...)
- p. 31, l. 36/37: "Lecture notes for..."