Abstract:
Scientific processes are often described in free text, making it difficult to represent and reason over them computationally. We present \textsc{schema-miner}$^{pro}$, a human-in-the-loop framework that automatically extracts and grounds structured schemas from scientific literature. Our approach combines large language models for schema extraction with an agent-based system that aligns extracted elements to external ontologies through interpretable, multi-step reasoning. The agent leverages lexical heuristics, semantic similarity, and expert feedback to ensure accurate grounding. We demonstrate the framework on two semiconductor manufacturing workflows—Atomic Layer Deposition (ALD) and Atomic Layer Etching (ALE)—mapping process parameters and outputs to the QUDT ontology. By producing ontology-aligned, semantically precise schemas, \textsc{schema-miner}$^{pro}$ lays the groundwork for machine-actionable scientific knowledge and automated reasoning across disciplines.