Large Language Models for Ontology Engineering: A Systematic Literature Review

Tracking #: 4001-5215

Authors: 
Jiayi Li
Daniel Garijo
Maria Poveda

Responsible editor: 
Guilin Qi

Submission type: 
Survey Article
Abstract: 
Ontology engineering (OE) is a complex task in knowledge representation that relies heavily on domain experts to accurately define concepts and precise relationships in a domain of interest, as well as to maintain logical consistency throughout the resultant ontology. Recent advancements in Large Language Models (LLMs) have created new opportunities to automate and enhance various stages of ontology development. This paper presents a systematic literature review on the use of LLMs in OE, focusing on their roles in core development activities, input-output characteristics, evaluation methods, and application domains. We analyze 30 different papers to identify common tasks where LLMs have been applied, such as ontology requirements specification, implementation, publication, and maintenance. Our findings indicate that LLMs serve primarily as auxiliary ontology engineers, domain experts, and evaluators, using models such as GPT, LLaMA, and T5 models. Different approaches use zero and few shot prompt techniques to process heterogeneous inputs (such as OWL ontologies, text, competency questions, etc.) to generate task-specific outputs (such as examples, axioms, documentation, etc.). Our review also observed a lack of homogenization in task definitions, dataset selection, evaluation metrics, and experimental workflows. At the same time, some papers do not release complete evaluation protocols or code, making their results hard to reproduce and their methods insufficiently transparent. Therefore, the development of standardized benchmarks and hybrid workflows that integrate LLM automation with human expertise will become an important challenge for future research.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 17/Jan/2026
Suggestion:
Major Revision
Review Comment:

This paper provides a detailed survey of the use of large language models (LLMs) in Ontology Engineering (OE). It contains a well-structured summary of relevant OE tasks, different implementation approaches, and evaluations of LLM-based methods, as well as a detailed description of the literature search process. The authors also provide an open-source repository containing the raw data.

The main concern is that the paper focuses primarily on studies published before 2024, which makes it insufficiently comprehensive given that the paper was submitted at the end of 2025. It should at least include papers published in early or mid-2025. I strongly recommend that the authors update the survey by re-running the search pipeline as described in the paper.

Here are some minor issues:

- The introduction starts with Knowledge Graphs, which feels somewhat strange.

- Please provide the full name of CQ when it first appears on Page 4.

- At the beginning of Section 4, the reduction from 5,275 papers to 204 is very large. Could the authors add a sentence summarizing the main characteristics of the excluded papers?

- Figure 2 contains several ambiguities. The meanings of Task A, B, C and Steps 4, 5, and 6 are not explained. The terminology is also inconsistent (e.g., “zero” vs. “zero-shot”), and in the final block, some text is overlapped by the figure elements.

- The sentence “Coutinho (2024) developed a system merge text-based languages for ontologies with LLMs to generate new concepts based on contextual information for the unified foundational ontology (UFO)” is difficult to understand and should be rephrased.

- Section 3.1 (Research Questions) appears too early and is too far removed from the detailed discussion in Section 4.2. Consider moving it closer to that section. Moreover, expressing each section title as a research question (e.g., “How do LLM-based approaches support different ontology development activities?”) feels redundant. It would be more straightforward to use descriptive titles such as “LLM-based Ontology Development.” The same applies to other sections structured around research questions.

- Section 4.3.1 seems to overlap with the second paragraph of Section 4.3.3, and the descriptions are not consistent. Section 4.3.3 states that 9 studies do not conduct any explicit evaluation, whereas Section 4.3.1 reports 14. This discrepancy should be clarified.

- For Section 4.3.2, it would be helpful to include an overall summary or description of the datasets, possibly in the form of a table or an additional paragraph.

- In Section 4.3.4, when discussing evaluation results, the authors should consistently provide specific numerical values instead of vague terms such as “high performance” or “better performance.” While this is done well in most cases, some instances are missing, such as: “…Lower temperature settings were found to reduce hallucinations without compromising accuracy (Rebboud et al., 2024a)” and “…Zephyr \beta and UNA achieved high precision.” In addition, “Zephyr \beta” and “UNA” appear to be LLMs; please explicitly state this.

Review #2
Anonymous submitted on 01/Feb/2026
Suggestion:
Accept
Review Comment:

Based on the comparison between the new version and the previous version, the authors have made substantial revisions that address my concerns.

Regarding the W1 in my previous review:

The authors have improved the organization of the manuscript. Although the research tasks in the field of LLM-based OE are indeed scattered, the current structure, supported by the updated figures (the new Figure 2), provides a much clearer logical flow and a better overview of the research landscape.

Regarding the W2 in my previous review:

The authors have updated some content in Section 4.1 and Section 4.2.1 in the revised manuscript, resolving the issue of redundancy.

Regarding the W3 in my previous review:

The authors have added a detailed technical analysis in Section 4.2.3 regarding "What LLM prompt techniques are employed," covering strategies such as zero-shot, template-based, chain-of-thought, and fine-tuning. This addition is crucial as it moves beyond simply listing models to explaining the implementation mechanisms. By discussing how these prompt engineering techniques are applied to specific OE tasks, the paper now successfully answers the "How" question from a technical perspective, addressing the major shortcoming of the previous version.

Regarding the W4 in my previous review:

The presentation has been improved. The revised Figure 2 (now Figure 3 in the revised version) is comprehensive and visually professional.

I have no further concerns about the revised version.