\textsc{schema-miner}$^{pro}$: Agentic AI for Ontology Grounding over LLM-Discovered Scientific Schemas in a Human-in-the-Loop Workflow

Tracking #: 3871-5085

This paper is currently under review
Authors: 
sameer sadruddin
Jennifer D'Souza
Eleni Poupaki
Alex Watkins
Bora Karasulu
Sören Auer1
Adrie Mackus
Erwin Kessels

Responsible editor: 
Guest Editors 2025 LLM GenAI KGs

Submission type: 
Full Paper
Abstract: 
Scientific processes are often described in free text, making it difficult to represent and reason over them computationally. We present \textsc{schema-miner}$^{pro}$, a human-in-the-loop framework that automatically extracts and grounds structured schemas from scientific literature. Our approach combines large language models for schema extraction with an agent-based system that aligns extracted elements to external ontologies through interpretable, multi-step reasoning. The agent leverages lexical heuristics, semantic similarity, and expert feedback to ensure accurate grounding. We demonstrate the framework on two semiconductor manufacturing workflows—Atomic Layer Deposition (ALD) and Atomic Layer Etching (ALE)—mapping process parameters and outputs to the QUDT ontology. By producing ontology-aligned, semantically precise schemas, \textsc{schema-miner}$^{pro}$ lays the groundwork for machine-actionable scientific knowledge and automated reasoning across disciplines.
Full PDF Version: 
Tags: 
Under Review