Review Comment:
This paper provides a detailed survey of the use of large language models (LLMs) in Ontology Engineering (OE). It contains a well-structured summary of relevant OE tasks, different implementation approaches, and evaluations of LLM-based methods, as well as a detailed description of the literature search process. The authors also provide an open-source repository containing the raw data.
The main concern is that the paper focuses primarily on studies published before 2024, which makes it insufficiently comprehensive given that the paper was submitted at the end of 2025. It should at least include papers published in early or mid-2025. I strongly recommend that the authors update the survey by re-running the search pipeline as described in the paper.
Here are some minor issues:
- The introduction starts with Knowledge Graphs, which feels somewhat strange.
- Please provide the full name of CQ when it first appears on Page 4.
- At the beginning of Section 4, the reduction from 5,275 papers to 204 is very large. Could the authors add a sentence summarizing the main characteristics of the excluded papers?
- Figure 2 contains several ambiguities. The meanings of Task A, B, C and Steps 4, 5, and 6 are not explained. The terminology is also inconsistent (e.g., “zero” vs. “zero-shot”), and in the final block, some text is overlapped by the figure elements.
- The sentence “Coutinho (2024) developed a system merge text-based languages for ontologies with LLMs to generate new concepts based on contextual information for the unified foundational ontology (UFO)” is difficult to understand and should be rephrased.
- Section 3.1 (Research Questions) appears too early and is too far removed from the detailed discussion in Section 4.2. Consider moving it closer to that section. Moreover, expressing each section title as a research question (e.g., “How do LLM-based approaches support different ontology development activities?”) feels redundant. It would be more straightforward to use descriptive titles such as “LLM-based Ontology Development.” The same applies to other sections structured around research questions.
- Section 4.3.1 seems to overlap with the second paragraph of Section 4.3.3, and the descriptions are not consistent. Section 4.3.3 states that 9 studies do not conduct any explicit evaluation, whereas Section 4.3.1 reports 14. This discrepancy should be clarified.
- For Section 4.3.2, it would be helpful to include an overall summary or description of the datasets, possibly in the form of a table or an additional paragraph.
- In Section 4.3.4, when discussing evaluation results, the authors should consistently provide specific numerical values instead of vague terms such as “high performance” or “better performance.” While this is done well in most cases, some instances are missing, such as: “…Lower temperature settings were found to reduce hallucinations without compromising accuracy (Rebboud et al., 2024a)” and “…Zephyr \beta and UNA achieved high precision.” In addition, “Zephyr \beta” and “UNA” appear to be LLMs; please explicitly state this.
|