Hierarchical Blockmodelling for Knowledge Graphs

Tracking #: 3850-5064

Authors: 
Marcin Pietrasik
Marek Reformat
‪Anna Wilbik‬

Responsible editor: 
Maria Maleshkova

Submission type: 
Full Paper
Abstract: 
In this paper, we investigate the use of probabilistic graphical models, specifically stochastic blockmodels, for the purpose of hierarchical entity clustering on knowledge graphs. These models, seldom used in the Semantic Web community, decompose a graph into a set of probability distributions. The parameters of these distributions are then inferred allowing for their subsequent sampling to generate a random graph. In a non-parametric setting, this allows for the induction of hierarchical clusterings without prior constraints on the hierarchy's structure. Specifically, this is achieved by the integration of the Nested Chinese Restaurant Process and the Stick Breaking Process into the generative model. In this regard, we propose a model leveraging such integration and derive a collapsed Gibbs sampling scheme for its inference. To aid in understanding, we describe the steps in this derivation and provide an implementation for the sampler. We evaluate our model on synthetic and real-world datasets and quantitatively compare against benchmark models. We further evaluate our results qualitatively and find that our model is capable of inducing coherent cluster hierarchies in small scale settings. The work presented in this paper provides the first step for the further application of stochastic blockmodels for knowledge graphs on a larger scale. We conclude the paper with potential avenues for future work on more scalable inference schemes.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Yujia Zhang submitted on 27/Jun/2025
Suggestion:
Accept
Review Comment:

The authors have effectively incorporated previous feedback into this revised manuscript.

The paper has a solid foundation, clearly articulating the motivation for entity clustering and the contributions of using a generative model to induce hierarchical clusters.

The model and its latent variable inference process are explained in detail.

The authors present a strong quantitative and qualitative evaluation, along with a well-outlined analysis of the induced hierarchical tree.

Despite the limitation of a small-scale setting, the paper provides valuable insights for future work.

Review #2
Anonymous submitted on 14/Jul/2025
Suggestion:
Accept
Review Comment:

This first revision addresses the concerns raised in the initial submission. There are only a few minor issues:

- p. 3, l. 26: "as they don't capture" -> 'do not'
- p. 5, l. 44: "Where A_ijr and" -> I am not sure if this was intended, that this subordinate clause now became a sentence on its own.
- p. 6, l. 41: "the chance observing" -> 'chance of observing'
- p. 22, l. 35: "Such as formulation" -> 'Such a formulation'
- p. 29, l. 11ff: For consistency, I would suggest also highlighting the properties 'lived in', 'nationality', 'athlete', and 'place of birth'