Review Comment:
The paper provides a systematic literature review about learning approaches for updating knowledge graphs (KG) in the manufacturing domain. While the topic of the paper is in general interesting and relevant for the community, it fails to present the content clearly and comprehensibly. Thus, I suggest to improve the entire article by (1) rethinking the overall alignment and (2) reworking the presentation of the results.
The strong points of the papers are:
S1: The paper addresses an interesting and relevant topic.
S2: The paper is based on a sound research method.
The following opportunities for improvement could be identified:
O1: The terms and distinction between “dynamic” and “evolve” KG is not clear.
O2: The knowledge graph construction process (KGCP) should be reconsidered, and the results of the paper should be aligned with the single steps for a clear presentation.
O3: The Introduction should be rewritten to clearly outline the content of the paper.
O4: The classification of learning method (Tables 7, 8, 9) should be reconsidered.
O5: Data quality assurance in KGs is represented insufficiently.
O1-O5 are detailed in the following paragraphs.
O1: The definitions and distinction between “dynamic” and “evolve” KG are obviously introduced by the authors themselves and should not be assumed to be known. Thus, I either suggest to (i) define and differentiate both terms more clearly right at the beginning of the paper – by also citing others work than the ones provided by the authors – or (ii) remove the distinction from the paper. I personally think that the paper would not lose anything if the distinction between the two terms is removed. The terms are especially confusing in title and abstract and do not really explain what can be expected from the paper. A title like “A Systematic Literature Review on Knowledge Graph Construction in Manufacturing” would seem more appropriate to me.
O2: Knowledge Graph Construction Process.
Here, purely addressing the „construction“ of KGs seems a bit limited since the authors also discuss the evolvability beyond an initially constructed KG, that is, the KG lifecycle. Second, the single steps should be numbered and a distinction between the ontology and instances should be made. This structure should then be used as structure for Section 3. Currently, it is not clear, why the structure of Section 3 does not align with the KGCP. Instead of purely building on the author’s own work, I suggest taking a broader look into KG literature and rethinking the KKCP:
- Zhong, L., Wu, J., Li, Q., Peng, H., & Wu, X. (2023). A comprehensive survey on automatic knowledge graph construction. ACM Computing Surveys, 56(4), 1-62.
- Paulheim, H. (2017). Knowledge graph refinement: A survey of approaches and evaluation methods. Semantic Web, 8(3), 489-508.
O3: Introduction.
The current Introduction does not outline what can be expected from the paper. I suggest moving the outline from page 3 before subsections 1.1 and 1.2. It does not quite get clear why the authors suddenly introduce the KGCP without motivating its use for the survey. In addition, the concepts of dynamic and evolvable KG (see O1) are used without definition. I suggest heavily reworking the Introduction by (1) clearly framing the context, (2) providing an overview on existing research, and (3) stating the research gap -> which leads to the authors contribution.
O4: Classification of learning methods.
Table 7, 8, and 9 basically provide one of the core contributions of this work, namely, a classification of the identified learning methods. Unfortunately, the organization in these classes is neither explained nor intuitive. For example, in Table 7, it is not clear why reinforcement learning appears first as subclass of Representation, and second as separate class below representation. In Table 8, it is not clear whether the approaches address only changes in the ontology or also within the instances. Also, the categorization in Table 9 is not clear. It seems that the authors confused the “what is updated” (i.e., data/knowledge) with “who caused the update” (i.e., humans vs. machines). While “human input”, “cooperation”, and “machine-enriched updates” make sense, mixing in “data/knowledge updates” needs more explanation. To me, it seems that this is a separate aspect and should not be confused with the cause or trigger of the change. Further, captions are missing for the columns: the relationship between the first and second column in Tables 7 and 8 is not clear. It would also be helpful to highlight which papers consider one or more learning approaches, i.e., if a very specific one is used very often or if many papers use a wide variety of learning approaches. This aspect is not addressed at all.
O5: Data quality in knowledge graphs.
While the authors presume that “the quality of knowledge within KGs is an overlooked aspect” (cf. 4.3 Challenges) they do not sufficiently consider related work about this aspect. One reason might be the unclear assignment of quality assurance within the KGCP, which leads to a spread discussion of the topic throughout the paper: some parts are discussed in 3.3.1. Cleaning as a type of data preprocessing, some aspects are discussed in 3.5, and some other aspects in 3.7.1 Changes through inferencing algorithms – Node changes. A global look at the topic of quality assurance in KGs would be useful here. I suggest the authors to consider the following papers (amongst others) and rethink their integration of KG quality within the KGCP:
- Rabbani, K., Lissandrini, M., & Hose, K. (2023). Extraction of validating shapes from very large knowledge graphs. Proceedings of the VLDB Endowment, 16(5), 1023-1032.
- Issa, S., Adekunle, O., Hamdi, F., Cherfi, S. S. S., Dumontier, M., & Zaveri, A. (2021). Knowledge graph completeness: A systematic literature review. IEEE Access, 9, 31322-31339.
Further comments:
- Introduce all abbreviations before the first use (e.g., KG in the abstract, DT, OPC UA).
- Explain the concepts of Industry 4.0 and 5.0 shortly in the Introduction with one sentence, not assuming this should be of general knowledge by all readers.
- Introduction: I suggest using the original cite for the linked data paper mentioned in 1.1.
- Introduction: triples (noun-verb-noun): here, subject-predicate-object is most probably more common.
- Research questions should be refined such that all three of them refer to the singular or plural consistently. E.g., for RQ2 it is unclear whether task automation is investigated within one solution or in general for all observed solutions.
- Page 4: “The context is (IKG) …” -> remove parentheses.
- Standardize automated vs. automatized.
- EC6: clarify how poor quality is determined.
- Table 4: information on how many papers were excluded per EC should be added. In addition, it should be mentioned whether the exclusion criteria were applied in the respective order or whether more than one could be assigned to one result.
- TRL levels: it would have been interesting how many results where excluded due to EC8 having a TRL<3. For basic research papers, lower TRL levels would have been expected, but in my perception, these works would still yield very interesting solutions for the paper to be investigated. Especially wrt. to new trends of KG learning that are not yet implemented in productive KGs. If possible, a revised version should include those works as well.
- Section 3 should be renamed to “Results”.
- The Google Knowledge Graph should be cited or footnoted.
- Fig. 4, 5, and 6 should be ordered by decreasing number of publications. In addition, for each Figure it should be mentioned whether a publication has a single assignment or multiple possible ones.
- 3.1. Ontologies: here, I miss a discussion that in real-world manufacturing settings, automatically built ontologies typically do not meet the quality standards and expectations of the domain experts out of the box.
- Section 3.4.: “For the ontology, Protégé is certainty the most used tool …” -> since the section is about storage of knowledge, it is not clear whether Protégé is used for storing or building the ontology. The latter is mor probably assumed.
- Page 14: “data or knowledge updates” -> clarify whether data or knowledge is meant.
- Page 14: the citation for evolvable KGs should be properly cited.
- Page 14: “… the human gives their …” -> unclear whether singular or plural.
- Cite SKOS and explain abbreviation.
- Page 17: “… to allow some conclusions.” -> which conclusions?
- Discussion: a bullet point list with the most important findings would be useful to make the contribution better readable.
- Conclusion: “In conclusion, evolvability and related learning approaches …” -> so far, the concept of evolvability was defined through learning approaches, this sentence is rather confusing.
- The references should be reviewed since many journal names are missing and first names are written inconsistency (sometimes full first name, sometimes abbreviated). Examples: [33, 67, 73]
|