Review Comment:
In this paper the authors present SCHEMA-MINER(pro), which is an extension of the SCHEMA-MINER paper presented at ESWC 2025. In brief, this framework is an agentic AI approach to model processes from scientific papers and map the to grounded ontologies. The authors, provide a qualitative and a quantitative evaluation, as well as a web interface so user can play with it.
----
Originality: HIGH
Significance of the results: HIGH
Quality of Writing: must be improved
Data and web app are appropriatedly documented.
----
I would like to start with that I liked the paper, and I feel workwise is at a mature stage and a concrete milestone has been reached. However, the paper falls a bit short in the way it is written. Several passages are a bit confused and required me to read it a few times before I could move forward. Hence, what follows is feedback on portions of text, I would ask the authors to clarify.
Page 2: “Unlike a purely prompt-based approach in which a large language model is queried to align extracted schema elements to ontological concepts in a single pass, our agentic workflow decomposes the alignment task into structured, tool-augmented steps.” This sentence is confusing. If one reads the paper, they can grasp the meaning. But as this is in the introduction, it is hard to understand if read as first thing. You mean in the literature, approaches build schemas in one pass (please cite), but then you act differently, in a multi-stage fashion.
Page 2: “The agent iteratively performs heuristic string matching and embedding-based semantic search to identify candidate ontology classes or properties for each schema element.” At this stage it is hard to understand what concept and what schema we are talking about. By reading the paper, I understood we are talking about a grounded schema, but this is not evincible here.
Page 2, you mention that you have diverse use cases (ALE and ALD) to test the system’s robustness. Are these really diverse? I fear this is an overstatement and needs to be contextualised to the field of application. The authors offer insights toward the end of the paper about the generalisability. However, these are just insights and lack concrete evidence.
Reference 48 uses arxiv. Perhaps this was submitted before the ESWC paper was published. Now, you can use the correct reference.
“It begins by prompting one or more LLMs to generate a draft process schema by extracting relevant entities and properties—such as materials, parameters, or measurements—from a curated set of scientific texts.” It is confusing at this stage. I have realised throughout the paper there is 1 document first and 1-10 later. Perhaps, you can clarify this aspect here.
“Pre-indexed ontology space” where this pre-indexed schema comes from? Do you build it live?
Page 5: “from unstructured process specification document” Are there specific requirements on what this document should be? Research paper, scientific report, documentation?
Page 5: “The initial schema is evaluated by the domain experts, who evaluate its completeness, correctness, and semantic clarity. Their feedback is very important for informing subsequent refinement steps.” How the domain experts report the feedback? And how the data structure embeds the feedback so that it can be used for the next stage?
Page 5: “A small collection of research papers is curated by the domain experts of around 1-10 papers which are considered to be state-of-the-art and highly specialized publications for the target process.” Are there direct specifics for these papers? What do they need to contain? What if there are more than 10? How should domain experts proceed to rank them?
Page 6: “While stage 2 emphasizes domain grounding and precision via a curated literature set, stage 3 prioritizes scalability and generalizability, capturing a wider spectrum of process variations and terminological differences.” How do you deal with conflicting statements?
Page 7: There is a block on the left of Figure 2, called schema-miner. Does it mean that the previous paper (48) is that block, and then this paper (pro version) extends by adding vertical blocks?
Page 7: “FAISS-based” this acronym is being employed before its definition (later)
Page 7: “error prone” The authors need to justify why manual grounding is error prone. Which types of errors? Syntactic, Semantic, modelling, disagreement between experts. Ideally, we would expect a reference here.
Page 8: Section Step 1: Ontology and Schema Input. It is clear that this stage takes three inputs. However, the description of the inputs, one paragraph later is not consistent with the numbered list. I literally thought that the input schema and the machine-readable ontology were both part of the point (1). Whereas they are point 1 and 3 respectively.
Side question: can the user provide more than one grounded ontology? Usually, KGs/ontologies are modelled taking advantage of several existing ontologies.
Page 8: “When grounding an ambiguous schema property, the agent queries the FAISS index to retrieve the most semantically similar chunk.” How do you build the vectors? Which model do you use? Please clarify
Page 14: “Domain experts” As we are merging two domains: material science and semantic web, the reader would appreciate a bit more the extent of expertise in these two domains.
Page 14: seconds, milliseconds there is a latex typo (on the opening quote)
Page 16: “Each comparison here considers the output of one LLM as the candidate schema and the other as the reference schema.” this is not clear. If one LLM output is compared against the other. There are two other LLM outputs. For instance, if the GPT-4o is the main output, is it compared against GPT-4-turbo or LLaMa 3.1? or both? This is not clear. It becomes clear in Table 1 (but the reader has to wait till then to get insights on this doubt)
Table 1 and Table 2 are not referenced in the paper. Also you can position table 2 nearby table 1.
Results, quantitative analysis: Have you performed an ablation study to understand why the models differ from each other? Is there a consistent mistake? What is the lesson learnt here? Is there any insight out of the qualitative analysis that you can provide back to the community for future work?
Experiment type-3, type-4 at page 16. perhaps you may want to be consistent with your nomenclature
In the qualitative study, I liked the impact of the various factors. With domain experts you assess the correctness. Do you have insights on the comprehensiveness? Is the generated schema covering all aspects of the process?
Thanks a lot for the great work.
|