Bottom-up anytime discovery of generalised multimodal graph patterns for knowledge graphs

Tracking #: 3767-4981

Authors: 
Xander Wilcke
Rick Mourits
Auke Rijpma
Richard Zijdeman

Responsible editor: 
Aldo Gangemi

Submission type: 
Full Paper
Abstract: 
Vast amounts of heterogeneous knowledge are becoming publicly available in the form of knowledge graphs, often linking multiple sources of data that have never been together before, and thereby enabling scholars to ask and answer many new research questions. It is often not known beforehand, however, which questions the data might have the answers to, potentially leaving many interesting and novel insights to remain undiscovered. To support scholars during this scientific workflow, we introduce an anytime algorithm for the bottom-up discovery of generalised multimodal graph patterns in knowledge graphs. Each pattern is a conjunction of binary statements with (data-) type variables, constants, and/or value patterns. Upon discovery, the patterns are converted to SPARQL queries and presented in an interactive facet browser together with metadata and provenance information, enabling scholars to explore, analyse, and share queries. We evaluate our method from a user perspective, with the help of domain experts in the humanities.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 14/Nov/2024
Suggestion:
Major Revision
Review Comment:

Here are the strengths and weaknesses of the paper:

Strengths
The paper introduces an anytime algorithm that performs bottom-up discovery of generalized multimodal graph patterns. This method allows for incremental results, giving users the flexibility to stop the process and still obtain meaningful patterns, a valuable feature in exploratory data analysis.
The paper addresses a common limitation in knowledge graph pattern discovery, providing a more nuanced and comprehensive analysis tool.
The evaluation includes feedback from domain experts, specifically in humanities, assessing the relevance and interpretability of the discovered patterns. This approach helps validate the practical applicability and user acceptance of the proposed method.

Weaknesses
1. While the study provides insights from a select group of humanities experts, the small sample size (13 out of 42 contacted experts) may limit the generalizability of the results, especially to fields outside the humanities.
2. The patterns hard to interpret, especially without context or natural language explanations. The reliance on SPARQL and graph visualizations might limit accessibility for users unfamiliar with these formats, as the study suggests a preference for natural language descriptions.
3. The perceived utility and novelty of the patterns received mixed feedback, with some experts finding the patterns useful while others found them less applicable. This discrepancy may suggest a gap between the algorithm's output and domain experts' needs.

Review #2
Anonymous submitted on 23/Jun/2025
Suggestion:
Minor Revision
Review Comment:

The work introduces an anytime algorithm for the bottom-up discovery of multimodal graph patterns in existing knowledge graphs. The authors evaluate the method from a user perspective with a task-based questionnaire. The aim is to support scholars in finding potentially interesting patterns in the data that can spark new research questions.

It uses generalised graph patterns and pays particular attention to multimodal knowledge graphs, while also mitigating the curse of dimensionality.

Patterns are organised according to depth, length, width and support. A facet browser assists scholars, and it’s very interesting.

Strengths
- Aim and scope fit the journal; very interesting idea and implementation, though perhaps not the most original one
- Good technical quality
- The facet browser proposal is helpful to reduce barriers to usage
- Good writing, clarity also in figures and listings

Weaknesses
- Lack of extensive literature on patterns. For instance, there is a connection to earlier KG-exploration UIs such as Aemoo (Nuzzolese et al., 2016) that already let users navigate DBpedia via automatically learned patterns and contextual panels.
- Exploratory analysis is missing; e.g. how many patterns were found for which type, etc., would be interesting.
- Evaluation through a user-based study is good in theory, but some details are missing. It does not say how the users were selected, invited, who they are (broadly) and even how many they are until Section 6.3; this should be introduced earlier. Participants' self-report experience might raise some problems connected to self awareness. People might also have biases towards automated pattern discovery for research, especially if they are not from technical backgrounds. Likewise, familiarity with the domain is unclear. Indeed, the results on the usefulness of the interface or its clarity depends on those backgrounds.
- Pattern browser not thoroughly explored, e.g. design principles.
- The approach does not capture and highlight infrequent yet semantically important regularities that may otherwise be overlooked due to low support.

Minor comments:
- Section 3: Perquisites is a typo for Prerequisites.
- Quotation marks in Listing 2 are incorrect, and also on line 30 of p. 13.
- Modelling interestingness as future work: "Palma, Cosimo, et al. Modelling Interestingness: Stories as L-Systems and Magic Squares. In: Text2Story @ ECIR. 2023. pp. 127-133" could be a good starting point
- The framework seems to have a name in the Github repository, HypoDisc, but it is never cited in the paper.

Overall, this paper tackles a timely and practically relevant problem—helping scholars surface meaningful structures in large, heterogeneous knowledge graphs—through a well-engineered, anytime pattern-mining pipeline and a thoughtfully designed facet browser. The technical core is solid and the presentation clear. Nonetheless, the manuscript would benefit from a deeper situating in prior KG-exploration literature, fuller reporting of the exploratory statistics behind the discovered patterns, and a more transparent description of the user-study protocol and participant profile. Addressing these points, along with the minor corrections listed, should be straightforward and will significantly strengthen both the empirical credibility and the broader impact of the work. I therefore recommend minor revision.

Review #3
Anonymous submitted on 05/Jul/2025
Suggestion:
Minor Revision
Review Comment:

The paper is of good quality and is well written. Overall, I believe it advances the state of the art in the field of knowledge graphs. The writing is fluent and enjoyable. The background material provided is sufficient, even for less experienced readers. I believe the paper will be of interest to readers of the journal. My main criticism concerns the section on related literature, which I find overly condensed. It would be interesting to consider other approaches and demonstrate how the proposed approach fits into the current state of knowledge and improves it. To date, there are multimodal knowledge graph completion/management approaches based on LLMs. What are the advantages and disadvantages of the proposed approach compared to an LLM-based one?