A Knowledge Engineering Primer

Tracking #: 3756-4970

Authors: 
Agnieszka Lawrynowicz
Jose Emilio Labra-Gayo
Mayank Kejriwal

Responsible editor: 
Guest Editors Education 2024

Submission type: 
Full Paper
Abstract: 
The aim of this primer is to introduce the subject of knowledge engineering in a concise but synthetic way to develop the reader's intuition about the area. The main knowledge organization systems are explained with examples. We also describe methodological aspects concerning knowledge engineering.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Sky Bristol submitted on 25/Feb/2025
Suggestion:
Minor Revision
Review Comment:

Overall, the manuscript presents the concepts, procedures, technologies, and theories involved in knowledge engineering in a reasonable and cogent fashion. The narrative places the many terms encountered in the field into a useful context. This could provide someone new to the field with a useful framework for developing a knowledge engineering practice.

The paper could benefit from more context in the introductory section describing knowledge acquisition and knowledge engineering to bolster the underlying focus of the narrative. This could include a couple of examples of what it means to engage in knowledge engineering practice, perhaps on a global/macro scale (e.g., iterative construction of something like Wikidata or OpenStreetMap) and on a local/micro scale (e.g., domain ontology engineering, corporate knowledgebase development, etc.). Essentially, a primer could better target the intended audience range - the potential KE practitioners needing to understand what they are getting into and why.

There are areas in the text that could benefit from more attention to the issue of managing uncertainty as a core aspect of knowledge engineering. For instance, in the discussion on entity recognition and linking, the problem is presented as more of a binary (the entity can be linked to a known source or not) vs. the reality where links we might assert through human and AI-assisted engineering come along with quantitative and/or qualitative characterization of confidence on a broader scale. Confidence is then increased (or decreased) over time through iterations and the incorporation of more sources or different reasoning approaches.

Overall, it would be good to stress the iterative nature of knowledge engineering practice. There is no one and done approach in this field. Knowledge, whether encoded formally or not, is fluid and often context-dependent. The importance of a sustainable knowledge management practice, learning and adapting through testable application, cannot be stressed enough as a core tenet of knowledge engineering.

Review #2
Anonymous submitted on 09/Apr/2025
Suggestion:
Minor Revision
Review Comment:

The paper aims to introduce knowledge engineering in a concise yet comprehensive manner, with the goal of developing the reader’s intuition about the field. The topic is within the journal’s scope and the content is original. Moreover, the paper is clear and generally well-written. However, there are several limitations that would benefit from revision. I have organized these limitations into the following categories: content limitations, issues with references, and other minor issues.

Content Limitations

Overall, the paper provides a general overview of the critical aspects of knowledge engineering. However, it could benefit from greater precision and the inclusion of more concrete examples in several areas:

On page 2, the paper poses the questions: “Why is the issue of representation important at all? What makes one representation better than another in the context of artificial intelligence?” Although the issue of representation is indeed crucial, it is only briefly addressed (see [1] for further discussion). In addition, the term “representation” is used with varying meanings across computer science; therefore, clarifying these distinctions would enhance the reader’s understanding.

The sentence on page 15 regarding knowledge acquisition and inference using the knowledge graph is rather generic. While automatic methods are mentioned, specific examples or detailed descriptions are lacking. In real-world scenarios, tasks such as data rule modeling and ontology creation are still largely performed manually.

The section on FAIR principles should more explicitly connect to knowledge engineering. For instance, the paper could discuss whether there are limitations concerning the reproducibility of KE tasks when employing LLMs.

Section 8, which deals with LLMs, could be enhanced further. The works presented in the ESWC LLM track—and the subsequent papers from ISWC and EKAW—on automatic ontology generation with LLMs or competency question generation are important (see, for example, [2], [3], [4], [5]). Additionally, the discussion should address reproducibility issues, especially since the chapter on FAIR principles has already highlighted related challenges. Moreover, the aspect of prompt engineering—where techniques such as Chain-of-Thought (CoT), and decomposed prompting are used—deserves further elaboration, even though other types of prompting have not been discussed (see references in this review for more). Lastly, while the issue of cost is relevant, it remains secondary as many researchers are currently exploring open-source LLMs.

References

The paper could be more precise and thorough regarding its references:

On page 5, the statement “Semantic networks became part of artificial intelligence research in the 1960s. However, they had already been used in philosophy, psychology, and linguistics before that” would be strengthened by including appropriate citations to support these claims.

When discussing frames, it would be beneficial to clarify that this concept is interpreted differently across various fields. References should include foundational works (e.g., Fillmore, Minsky, and Barsalou) along with more recent studies ([7] and [8]) of the semantic web domain (for example, see the Framester tool in [8]).

On page 20, ontology design patterns are mentioned, but neither reference papers nor website links are provided. Including these references would enhance the paper’s completeness.

Citation number 109 is not displayed in the reference list and should be corrected.

Minor Issues

Below are some additional minor issues:

On page 15, the term “NLP community” is used without introducing or defining the acronym NLP.

Footnote 2 is missing a closing period and should be corrected.

In Table 1, it is unclear where the classification from [13] applies and where it does not. This should be clearly indicated. Additionally, the term “frame” within the table should be defined or explained in the context of the paper’s discussion.

On page 19, Section 6.3, the term “knowledge graph” appears with inconsistent capitalization (i.e., the "K" in "Knowledge" is sometimes capitalized and sometimes not). This inconsistency should be resolved throughout the paper.

[1] Mollo, Dimitri Coelho, and Raphaël Millière. "The vector grounding problem." arXiv preprint arXiv:2304.01481 (2023).

[2] Saeedizade, Mohammad Javad, and Eva Blomqvist. "Navigating ontology development with large language models." European Semantic Web Conference. Cham: Springer Nature Switzerland, 2024.

[3] Fathallah, Nadeen, et al. "Neon-GPT: a large language model-powered pipeline for ontology learning." European Semantic Web Conference. Cham: Springer Nature Switzerland, 2024.

[4] Alharbi, Reham, et al. "An experiment in retrofitting competency questions for existing ontologies." Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing. 2024.

[5] Lippolis, Anna Sofia, et al. "Ontology Generation using Large Language Models." arXiv preprint arXiv:2503.05388 (2025).

[6] Shirdel, Moein, et al. "AprèsCoT: Explaining LLM Answers with Knowledge Graphs and Chain of Thought." (2025).

[7] Nuzzolese, Andrea Giovanni, et al. "Aemoo: Linked data exploration based on knowledge patterns." Semantic Web 8.1 (2016): 87-112.

[8] Gangemi, Aldo, et al. "Framester: A wide coverage linguistic linked data hub." Knowledge Engineering and Knowledge Management: 20th International Conference, EKAW 2016, Bologna, Italy, November 19-23, 2016, Proceedings 20. Springer International Publishing, 2016.

Review #3
By Harald Sack submitted on 21/Jul/2025
Suggestion:
Reject
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

The paper introduces to knowledge engineering in general, focusing on ontologies, knowledge graphs. some knowledge engineering methodologies, and finally LLMs and their application in knowledge engineering. Considering the Call for paper for the SWJ special issue on “the Pedagogy and Praxis of Knowledge Graphs and the Semantic Web” the paper does not really fall in one of the categories indicated in the special issue. It is neither (1) a formal evaluation of educational material, nor (2) the description of Open Source KG/SW educational material, nor (3) a report on or application of OS KG/SW Educational Material & Tools, nor (4) a survey. It is a lightweight introduction into the subject of knowledge engineering targeted towards engineers or software developers. As it doesn’t fall into one of the categories mentioned above, the paper has to be rejected for formal reasons.

Considering the paper an introduction to knowledge engineering, please take into account the following comments:

p1, line31: the definition of declarative knowledge requires a bibliographical reference
p1, line39: the definition of a knowledge base doesn’t consider axioms nor (logical) constraints.
p1, line47-49: besides expressivity, also consider granularity, level of detail and (computational) complexity.
p2, line1-3: the source you are referring to (Blumauer and Nagy) are themselves referring to Lasilla et al: The role of frame-based representations on the Semantic Web. 2001 [1] for the classification of knowledge organisation systems.
p2, line34-43: Natural language is also a knowledge representation! Please distinguish also “formal” knowledge representation systems in your enumeration.
p2, line49: You introduce the term “embedding” here, which requires an explanation.
p3, line 10-12 and 16: You are using the terms “knowledge”, “information”, and “data” without a proper definition. Data and information should clearly be distinguished from knowledge and vice versa. Probably make use of the DIKW pyramid [2]
p3, line 46-5: bib,iographical references for logics, semantics, meaning, interpretation, truth are missing. Alternatively you might give proper definitions.
p4, line 1-20: It is unclear why you are introducing the formal notation of entailment and interpretation, while other important terms (as e.g. “sound”ness, “complete”ness, “derivation”, etc.) are only explained using natural language. However, as the formalism is not required later in the paper, I suggest skipping it.
p5, line 1: semantic networks require a bibliographical reference.
p5, line 17-21, the graphical notation (diamond) is unclear and not explained.
p5, line 30: “qnames require further explanation and bibliographical reference”
p5, line 36: Why are you referring to “resources” ant not “entities”?
p5, line 37-39: The important use case of using blank nodes for existential assumptions is not mentioned.
p5, line 42: Datatyped literals are not mentioned
p6, line 1-15: I would not refer to the example as “reification” but as representation of n-ary relations. Otherwise this will be confused with RDF reification.
p6, line 37-42: If the example RDF is supposed to be RDF Turtle serialization, the period at the end of each statement is missing.
p6, line 43: The meaning of the filled box is unclear.
p6, line 47: A bibliographical reference for “frames” is missing.
p7, line 9-22: Please provide a formal explanation of the frame-based inference mechanism
p8, line 16: Please provide a bibliographical reference for Protégé.
p8, line 30: Bibliographical reference for Aristotle’s “Metaphysics” is missing.
p8, line 40: “Clarity” should refer to “Explicit”.
p9, line 1-5: Bibliographical references for SNOMED, GO, and CHEBI are missing.
p9, line 32-51: RDF classes and properties haven’t been defined (City rdf:type rdf:Class.)
p10, line 18-19: OWL ontologies can refer to different instances of description logics, depending on the OWL version/flavor used (OWL2, OWL2 DL, OWL2 EL, OWL2 QL, OWL2 RL, OWL2 Full)
p10, line 22:Besides classes and relationships, the T-Box can also contain axioms.
p11, line 38-51: symmetric properties, antisymmetric properties, reflexive properties are missing
p12, line 18: There is no “owl:subClassOf”!
p14, line 51: The formal definition of a knowledge graph doesn’t include the possibility of attributive triples.
p15, line 28-31: the term “crowdsourcing” doesn’t need this explicit explanation.
p16, line 1-15: Your definition of “property graph” is wrong. A property graph is a data model of various graph-oriented databases, where pairs of entities are associated by directed relationships, and entities and relationships can have properties.[3]
p16, line 12: give examples for open knowledge graphs.
p16, line 13-17: Why are knowledge graphs well suited for the applications mentioned? Please provide a rational or justification.
p16, line 44-49: Knowledge graphs not necessarily only contain nodes (entities) referring to named entities. Named entity recognition also provides classes for common entities (as e.g. under the class “misc”), which might be referred to in a knowledge graph.
p17, line 17/23: Probably you mean “DBpedia” instead of “Wikipedia” here.
p17, line 25: “Chicago” might refer to many more entities, as e.g. Chicago the band.
p17, line 28-46: a table would be helpful for better overview.
p17, line 44-45: Entity classification is a special case of link prediction. However, it can also be treated as (traditional) (multi-class) classification problem.
p18, line 22: Knowledge graph embeddings are typically created using unsupervised learning techniques. The primary goal of these embeddings is to represent entities and relationships in a continuous vector space while preserving the structural and semantic information of the knowledge graph. This is achieved by optimizing objective functions that capture the relationships between entities, such as translational models…
p18, line 22: “composition, inverse, and antisymmetry” require explanations and dedicated examples.
p19, line 1-14: FIgure 8 is not explained in sufficient detail in the text.
p20, line 1-2: It is not explained how shape based constraints can be tested automatically.
p20, line 21-26: The example for “opaque URIs” as “geo:Locality1” is not well chosen as the issue of multilinguality, as mentioned in line 22, does not hold for this English language-based example. Better use Q-Identifiers from Wikidata as example.
p20, line 30-51: Too few information for Ontological engineering methodologies. Not a single methodology is referenced or mentioned.
p20, line 45-51: Provide a figure illustrating ODPs and Ontologies derived from ODPs for better understanding.
p21, line 1-20: It remains unclear in how far and how exactly FAIR principles can be applied to knowledge representation artefacts.
p21, line 27-28: “Foundation Models” and LLMs are not synonymous. Bibliographical references are missing.[4]
p21, line 24-44: Discussion of LLMs as knowledge representations is missing.
p22, line 1-21: How exactly can knowledge graphs be constructed with the help of LLMs?
p22, line 30: The term “SAT-style analogies” needs to be explained (and referenced)
p22, line 47-51: It remains unclear why especially LLMs should be well suited for the representation of common sense knowledge. Common sense knowledge often has never been recorded as a text being available for the training of LLMs. This needs to be explained.

References:
[1] Lassila, O. and McGuinness, D.L. 2001. The Role of Frame-Based Representation on the Semantic Web. Technical Report #KSL-01-02. Stanford University.
[2] Ackoff, R.L. 1989. From Data to Wisdom. Journal of Applied Systems Analysis. 16, (1989), 3–9.
[3] R. Angles, "A Comparison of Current Graph Database Models," 2012 IEEE 28th International Conference on Data Engineering Workshops, Arlington, VA, USA, 2012, pp. 171-177,
[4] Rishi Bommasani, Drew A Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.