Review Comment:
SUMMARY:
This paper proposes a survey on as well as an interactive catalog of LLM-based interaction patterns with the objective to better specify competency questions as utilized for ontology engineering. To this end, interaction patterns in LLM-based applications in the arts and creative domain are explored. Patterns derived from this domain are then to be integrated in form of an interaction model into an existing competency questions elicitation and ontology engineering tool OntoCaht.
OVERALL EVALUATION: MAJOR
The idea of taking interaction patterns from LLM-based applications that focus on creative and artistic tasks and activities and take these to a tool for competency questions in ontology engineering is definitely interesting, however, only became entirely clear to me on Page 11. Thus, the first sections, especially the very first sections need a thorough revision to strongly improve the clarity of the proposed approach and objectives.
The validity of the idea is to be assessed based on an interaction model that is integrated into OntoChat. However, the interaction model and its integration are described similar to a technical manual without any justifications or explanations for decision. The most considerable weakness of the submission, however, is the lack of evaluating the strongly adjusted and visually/cognitively enriched user interface/application. At least a use case of ontology engineers not involved in the development of the interactional model using the newly created US would be necessary in my view. The fact that the paper merely presents a conceptual model but no actual interaction model being integrated into OntoChat does not become clear before Page 19. This should be stated explicitly from Page 1, if a conceptual model was sufficient, which I believe it not to be for a journal publication.
The idea of improving the human-LLM interaction in order to ensure that creative thinking processes are (not so much/less) impeded is an innovative direction of research. However, the lack of clarity and concretely tested and evaluated interaction patterns/designs reduce the validity of the submission.
DETAILED COMMENTS:
While it might be very clear to the authors what they mean by convergent-divergent thinking, to ensure wide intelligibility by a varied group of readers, I would strongly recommend defining both terms properly beyond aligning them with lateral/vertical early on before Page 3.
The illustration of the literature review process in Figure 1 is nicely done and provides a very good overview of the selected process. The selection of search databases is not entirely clear to me as these are the ones presented are very selective. Furthermore, the explanation of the keyword selection needs further clarification. The text suggests the term LLMs was excluded, but the example query indicates a different direction. Furthermore, I was lost at this point on what the objective of this survey is and why creat* and art* would be included if the objective is interaction and user interface design. Should the search not focus at least also on interaction design patterns and/or user interfaces?
On which basis were the retrieved papers ranked? The method indicates that the 12 top papers served as a basis for exclusion/inclusion, however, how was this ranking from top to bottom determined? Simply based on date or citations? The relevance would presumably strongly depend on the keywords. It would be good to really provide the full keyword list utilized. I would also like to point out that Figure 13 states the top 12 papers were utilized, whereas the text states it is the top 12 papers.
For the screening, the calibration of the eligibility criteria is well explained and reasonable. However, the actual screening process is not further specified. Was this done manually? By how many people? Only based on title, abstract and keywords?
As regards the reported kappa value for the codebook generation and application, was the reported score calculated before or after the detailed in-person coordinating discussions? I would presume the number was calculated after that process, as it is quite high for the task.
In the findings section, Table 2 indicates IRR but no mention of it is in the text. I think it should be mentioned in the text that this presumably is the kappa value per code. To me personally, it is also strange that the codes and the codebook are presented but the actual results of the survey are reduced to the categorization of references in Table 2. Should a survey not also include a description of the results set? In Section 4.1. the whole text presents these codes and categories as if these are well-established and matter-of-fact, however, no references are included. Where are these descriptions taken from? How do the interaction techniques directly relate to the cited references? Even though assumptions can be made, this should still be explicitly stated.
The objective of this paper is very clearly described on Page 15, when the limitations of the OntoChat system are presented. I believe that a similar argumentation very early on, i.e., abstract or introduction, would tremendously help clarify the idea.
The introduction of the interaction model equally to Section 4.1 reads like a technical manual rather than a research contribution. To change this, decisions should be explained and justified. The major issue in my view is a complete lack of evaluation of the newly proposed and creatively/artistically inspired interaction model integration into OntoClean. Without an evaluation in a practical setting or at least a use case, the validity of the whole idea cannot be determined.
In the limitations, an evaluation of the quality of the CQs is mentioned, however, no explicit evaluation is presented elsewhere. This is very confusing as are the contradictory arguments across the paper about actually integrating an interaction model and then finally only proposing a conceptual model.
MINOR COMMENTS:
1.23 absence of supports => the noun support has no plural
2.4 em-dashes => I would strongly recommend replacing em-dashes by syntax and academic punctuation, which in this case would be commas; in 2.14 its even grammatically problematic as is the sentence; since em-dashes are used annoyingly often, I would strongly recommend removing them since they are very typical of LLM-generated texts, as are brackets around examples.
2.8 CQs candidates => CQ candidates
2.22 with LLM => Did you mean what LLM-based UI designs are needed?
2.23 What supporting can => What support can
2.31 we also includes => include
2.32 cognitive process => I would propose either cognitive processes or the cognitive process
3.7 It is quite uncommon to have a heading be followed by a (sub-)heading
3.48 This is the first time that an acronym is introduced correctly with capitalizing letters that contribute to the acronym in the long form
4.43 lung is at best a term but not a terminology and a terminology may not be a point in space
5.10 The acronym HCI should best be introduced the first time it is presented
14.17 This is a very strange way of introducing simply acronyms; please make the introduction of acronyms consistent, ideally Long Form (short form) where the letters contributing to the acronym are capitalized in the long form
15.35 Referring to Figure 3, => As depicted in Figure 3,
17.37 Figure 3d => Figure 3 has no d and based on the context, did you maybe mean Figure 4d
|