Schema-Miner Pro: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow

Tracking #: 3871-5085

Authors: 
sameer sadruddin
Jennifer D'Souza
Eleni Poupaki
Alex Watkins
Bora Karasulu
Sören Auer1
Adrie Mackus
Erwin Kessels

Responsible editor: 
Guest Editors 2025 LLM GenAI KGs

Submission type: 
Full Paper
Abstract: 
Scientific processes are often described in free text, making it difficult to represent and reason over them computationally. We present \textsc{schema-miner}$^{pro}$, a human-in-the-loop framework that automatically extracts and grounds structured schemas from scientific literature. Our approach combines large language models for schema extraction with an agent-based system that aligns extracted elements to external ontologies through interpretable, multi-step reasoning. The agent leverages lexical heuristics, semantic similarity, and expert feedback to ensure accurate grounding. We demonstrate the framework on two semiconductor manufacturing workflows—Atomic Layer Deposition (ALD) and Atomic Layer Etching (ALE)—mapping process parameters and outputs to the QUDT ontology. By producing ontology-aligned, semantically precise schemas, \textsc{schema-miner}$^{pro}$ lays the groundwork for machine-actionable scientific knowledge and automated reasoning across disciplines.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Andrea Mannocci submitted on 18/Jun/2025
Suggestion:
Minor Revision
Review Comment:

The paper introduces Schema-MinerPro, a customisable framework that enables schema discovery from curated scientific texts and agent-based ontology grounding. The entire approach leverages multi-round interactions with LLMs.
The paper extends a previous approach presented by the authors; I reckon that the amount of novel contributions introduced here is enough.
The study is evaluated in two application scenarios under an extensive and comprehensive experimental setup. The approach is interesting, relevant, and timely, and the literature review is competently executed. The problem statement is clear and well described. The language is generally understandable.

Comments and Suggestions:

- A GitHub Repo is shared; however, it could be structured and documented better. It takes a bit of time to get a sense of what is in there, which is a lot! A little more guidance could be beneficial. Without a proper, extensive description, it will be harder to adapt the framework to another use case for a potential user.
- It is my understanding that part of the `src` contains framework components that you provide and should not be touched (e.g., services), while other files scattered around seem to embody the application at hand, e.g., QUDT-aware prompts. In general, I would advice you to structure the codebase so that it is cristal-clear what component are provided by the framework and what are instead the levers and knobs exposed to the user for customisation. You have the `data` folder for that purpose, but it seems that some files have escaped that logic.
- Also, you have been using the same GitHub repository in use since the previous publication, which is fine per se, but it makes it hard to track the different contributions of the two papers on GitHub. I.e., two papers pointing to the same repository, but describing slightly different versions of the same approach, schema-miner VS. schmaminer-pro. I guess it is ok, I am not suggesting that two distinct repositories are necessary. However, the README could be more explicit about this and mention the two papers, highlighting the differences. Also, be sure that readers can still validate and reproduce both the approaches and that the latter does not supersede de facto the former (i.e., you somewhat invalidate the source code of the first).
- On the notebook (https://github.com/sciknoworg/schema-miner/blob/main/notebooks/schema_mi...) I found this broken link "using a prompt template as demonstrated here." (https://github.com/sciknoworg/schema-miner/blob/a24226657ede2f740a56f758...)
- There is a bit of ambiguity stemming from the usage of stage Vs. step vs. phase. When you introduce schema-miner, you use "stage". When you introduce the agentic workflow, you use "step". However, in some cases, "phase" pops out as in "The first stage of the ontology grounding workflow in SCHEMA-MINERpro is the Ontology and Schema Input phase". In other cases, stage and step are used interchangeably, as in "Step 2: Property Matching. The second stage of the ontology grounding workflow, Property Matching".
- Also, it is not super clear where the agentic workflow is plugged on top of the previous approach (i.e., where schema-miner ends and where the pro begins). Anywhere online, it seems that the "pro branding" is never mentioned explicitly. As far as I can tell, the agentic workflow is applied entirely downstream to the output of Stage 3. If this is the case, perhaps extending Fig. 1 or integrating it with Fig.2 could help understand the whole workflow at a glance.
- Try to have the listing on page 9 all in the same column.
- In a couple of instances, you use ALD/E, while you mostly use ALD/ALE. Please uniform.
- You always use ALD before ALE, so in Figure 3, you could lay them down in the same order
- "is to allow LLM to extract" missing article or LLMs
- "will help LLM in generating schema" as above
- "for the LLM to improve schema:" missing article
- "determine if it exists in ontology" missing article
- The sentence starting with “A key motivation for adopting…” reads like a central argument but is placed deep within the discussion. It might fit better in the introduction or state-of-the-art section to highlight its significance earlier.
- The acronym QUDT should be spelt out in full at first mention, especially in the abstract. Also, a citation or link would be helpful in its first appearance.

Review #2
By Antonello Meloni submitted on 22/Jul/2025
Suggestion:
Minor Revision
Review Comment:

This paper extends previous work by integrating an agent-based system for semi-automatic ontology grounding, combining rule-based heuristics, semantic similarity, and expert validation. The proposed framework is well-motivated and technically sound, and the application to ALD and ALE processes is clearly presented. However, while the authors claim the system is domain-agnostic, the evaluation is limited to two highly specific manufacturing workflows. To strengthen the contribution, I recommend including evidence or a discussion demonstrating the system’s ability to generalize to scientific texts that do not describe narrowly defined or highly structured processes. For these reasons, I suggest a minor revision.

The paper introduces Schema-Miner Pro, a human-in-the-loop framework that combines large language models and an agent-based system to extract and ground scientific process schemas to external ontologies. The extension over previous work is primarily the introduction of a semi-automated grounding mechanism that integrates rule-based reasoning, semantic similarity, and expert oversight.

1) Originality:
The integration of agentic AI for ontology grounding is a valuable addition to the prior schema extraction pipeline. While the approach builds upon existing components, the coordination of multiple techniques for interpretable, step-wise grounding is innovative and practically relevant.

2) Significance of Results:
The demonstration of the framework on two complex semiconductor processes (ALD and ALE) showcases its practical utility in a real scientific domain. However, the evaluation is limited to narrowly defined and highly structured processes. While the authors claim the system is domain-agnostic, no empirical evidence is provided to support its applicability to less structured scientific texts or domains where process definitions may be ambiguous. Including such evidence or broader experiments would significantly strengthen the paper’s impact and generalizability.

3) Quality of Writing:
The paper is generally well-written and clearly organized. The methodology is carefully explained, and the figures help in understanding the multi-agent workflow.

4) Data and Resources:
A) Organization: The data files are well-organized and include a README that sufficiently documents the structure and usage of the resources.
B) Completeness: The resources provided appear adequate for understanding and replicating the experiments.

(C) Repository: The repository is hosted on GitHub, which is acceptable for long-term discoverability.

(D) Artifact Completeness: Overall, the data artifacts are reasonably complete.

Review #3
By Angelo Salatino submitted on 27/Jul/2025
Suggestion:
Minor Revision
Review Comment:

In this paper the authors present SCHEMA-MINER(pro), which is an extension of the SCHEMA-MINER paper presented at ESWC 2025. In brief, this framework is an agentic AI approach to model processes from scientific papers and map the to grounded ontologies. The authors, provide a qualitative and a quantitative evaluation, as well as a web interface so user can play with it.

----

Originality: HIGH
Significance of the results: HIGH
Quality of Writing: must be improved

Data and web app are appropriatedly documented.

----

I would like to start with that I liked the paper, and I feel workwise is at a mature stage and a concrete milestone has been reached. However, the paper falls a bit short in the way it is written. Several passages are a bit confused and required me to read it a few times before I could move forward. Hence, what follows is feedback on portions of text, I would ask the authors to clarify.

Page 2: “Unlike a purely prompt-based approach in which a large language model is queried to align extracted schema elements to ontological concepts in a single pass, our agentic workflow decomposes the alignment task into structured, tool-augmented steps.” This sentence is confusing. If one reads the paper, they can grasp the meaning. But as this is in the introduction, it is hard to understand if read as first thing. You mean in the literature, approaches build schemas in one pass (please cite), but then you act differently, in a multi-stage fashion.

Page 2: “The agent iteratively performs heuristic string matching and embedding-based semantic search to identify candidate ontology classes or properties for each schema element.” At this stage it is hard to understand what concept and what schema we are talking about. By reading the paper, I understood we are talking about a grounded schema, but this is not evincible here.

Page 2, you mention that you have diverse use cases (ALE and ALD) to test the system’s robustness. Are these really diverse? I fear this is an overstatement and needs to be contextualised to the field of application. The authors offer insights toward the end of the paper about the generalisability. However, these are just insights and lack concrete evidence.

Reference 48 uses arxiv. Perhaps this was submitted before the ESWC paper was published. Now, you can use the correct reference.

“It begins by prompting one or more LLMs to generate a draft process schema by extracting relevant entities and properties—such as materials, parameters, or measurements—from a curated set of scientific texts.” It is confusing at this stage. I have realised throughout the paper there is 1 document first and 1-10 later. Perhaps, you can clarify this aspect here.

“Pre-indexed ontology space” where this pre-indexed schema comes from? Do you build it live?
Page 5: “from unstructured process specification document” Are there specific requirements on what this document should be? Research paper, scientific report, documentation?

Page 5: “The initial schema is evaluated by the domain experts, who evaluate its completeness, correctness, and semantic clarity. Their feedback is very important for informing subsequent refinement steps.” How the domain experts report the feedback? And how the data structure embeds the feedback so that it can be used for the next stage?

Page 5: “A small collection of research papers is curated by the domain experts of around 1-10 papers which are considered to be state-of-the-art and highly specialized publications for the target process.” Are there direct specifics for these papers? What do they need to contain? What if there are more than 10? How should domain experts proceed to rank them?

Page 6: “While stage 2 emphasizes domain grounding and precision via a curated literature set, stage 3 prioritizes scalability and generalizability, capturing a wider spectrum of process variations and terminological differences.” How do you deal with conflicting statements?

Page 7: There is a block on the left of Figure 2, called schema-miner. Does it mean that the previous paper (48) is that block, and then this paper (pro version) extends by adding vertical blocks?

Page 7: “FAISS-based” this acronym is being employed before its definition (later)

Page 7: “error prone” The authors need to justify why manual grounding is error prone. Which types of errors? Syntactic, Semantic, modelling, disagreement between experts. Ideally, we would expect a reference here.

Page 8: Section Step 1: Ontology and Schema Input. It is clear that this stage takes three inputs. However, the description of the inputs, one paragraph later is not consistent with the numbered list. I literally thought that the input schema and the machine-readable ontology were both part of the point (1). Whereas they are point 1 and 3 respectively.
Side question: can the user provide more than one grounded ontology? Usually, KGs/ontologies are modelled taking advantage of several existing ontologies.

Page 8: “When grounding an ambiguous schema property, the agent queries the FAISS index to retrieve the most semantically similar chunk.” How do you build the vectors? Which model do you use? Please clarify

Page 14: “Domain experts” As we are merging two domains: material science and semantic web, the reader would appreciate a bit more the extent of expertise in these two domains.

Page 14: seconds, milliseconds there is a latex typo (on the opening quote)

Page 16: “Each comparison here considers the output of one LLM as the candidate schema and the other as the reference schema.” this is not clear. If one LLM output is compared against the other. There are two other LLM outputs. For instance, if the GPT-4o is the main output, is it compared against GPT-4-turbo or LLaMa 3.1? or both? This is not clear. It becomes clear in Table 1 (but the reader has to wait till then to get insights on this doubt)

Table 1 and Table 2 are not referenced in the paper. Also you can position table 2 nearby table 1.

Results, quantitative analysis: Have you performed an ablation study to understand why the models differ from each other? Is there a consistent mistake? What is the lesson learnt here? Is there any insight out of the qualitative analysis that you can provide back to the community for future work?

Experiment type-3, type-4 at page 16. perhaps you may want to be consistent with your nomenclature

In the qualitative study, I liked the impact of the various factors. With domain experts you assess the correctness. Do you have insights on the comprehensiveness? Is the generated schema covering all aspects of the process?

Thanks a lot for the great work.