Review Comment:
The paper describes a proposal of an Ontology Usability Scale, which corresponds to the System Usability Scale (SUS) for software systems, but "reinvented" for ontologies. It is based on a set of questions originating in existing ontology evaluation and reuse approaches, and a selection among them is made based on a small survey. The proposed evaluation questions are then applied in what is by the authors called a case study, and a quite extensive discussion is included on the motivation and potential alternatives/modifications of the selected questions to include.
The paper is well-written and easy to read, and the effort is certainly motivated and worth-while, i.e. a significant contribution. To have something corresponding to the SUS for ontologies would provide considerable practical benefits, and it could be used in many scenarios, such as as "rating criteria" in online ontology repositories or when testing newly built ontologies. The work is, as far as I can judge, original.
However, several aspects of the paper leads me to still conclude that this is rather early results, that would benefit from a bit more work before being published in a journal article. Nevertheless, considering that some time has passed since the paper was submitted, and that this is not a question of additional development work, I would expect that the authors already have more results to include regarding the more specific issues described below, and could therefore rather easily submit a revised version with these additions.
First, I have one general concern with the paper, which is the lack of definition of the terms "user" and "use". What does it mean to use an ontology? When I use ontologies it is usually for expressing some dataset with the classes and properties of that ontology, and publishing it or load it in some storage and management system. I also use ontologies to reason over data, i.e. maybe those data that I just loaded in my storage facility. Then I use the ontology, usually wrapped by some API, as a component, when I build the rest of the software system that will operate on the ontology and data, and use it to query my data. Finally, the end user of that system will (indirectly) use the ontology, as well as all the other software components of the system, when using the system. Which of these, or other, usage scenarios are you targeting? One? All? Can usability really be defined the same way if the usage is to only express data using the ontology as a vocabulary, or if the usage is to build a software system that uses the ontology as its knowledge representation for reasoning, or when considering end users' indirect usage? I am not so sure. Similarly, who is the user that assesses the usability? Is it the data engineer that wants to transform data into this new vocabulary? The software developer that produces the system? The end user who uses the system? One of them? Any or all of them? Does it make a difference? I would think so. Basically, the first thing the authors need to do is to 1) define the concepts "user" and "use" as used in their paper, since this affects what can be read into the term "usability" (i.e. usability for what?), and 2) describe any assumptions/limitations that are made regarding what users and usage that is considered in their work.
Although the process of developing the proposed evaluation question set is described quite in detail, it still feels a bit ad-hoc in some sense. The authors do not motivate each step clearly, although what is done is described. For example, first questions are changed to positive forms, then changed back later. Although some discussion on this is included, it is not really clear to the reader why this is necessary if the negative form is anyway going to be used at the end, and the reader is not assured that this translation back and forth would not impact the results (e.g. a "bad" translation could have made less people select that question in the survey). The survey is also quite small, and the subjects do not seem to be randomly selected (within a population of ontology users), but rather the paper hints at them being known associates of the authors, although from different institutions. This may be fine, but there should be a discussion about this, the potential bias introduced, and why the authors think that exactly these people were a representative set of subjects for the survey.
Another serious issue is the so-called case study and evaluation (section 4). The section is very short, and describes how the OUS has been used to evaluate a set of ontologies in some set of ontology projects. However, the only thing that is presented in the section is a table of numbers. On its own this does not say much about the proposed OUS. It basically only says that it could be used, but if the scores reflect some notion of actual usability, or even whether the subjects were satisfied with the OUS is not discussed at all. For this to be called a case study, or even evaluation, then there has to be some results confirming that the OUS is in some sense "correct", i.e. able to reflect some notion of usability, or at least subjectively thought of as useful by someone. Ideally, the authors would also extend the "case study" to actually include the second part that is mentioned, i.e. the "after" evaluation, after the revision of the ontologies. However, the most important thing is still to be able to draw some conclusions from the study regarding the quality of the proposed OUS - can we truts the results of this set of questions? Will the results be useful for selecting ontologies?
Minor issues include:
- Page 3, first paragraph: I am not sure that I agree that just because an ontology has been used more and/or has been around longer, it should obviously be preferred. This would mean that one could never publish an improved version of anything!
- Table 1: I am not sure about the terminology here. I would not call the syntax part "content", for instance. For me, content would be the actual concepts and relations, i.e. more related to the conceptualization than the structure. In bullet 4 "complex" is transformed to "brief", and although I am not a native English speaker and could be wrong, these do not seem to be opposites to me. Similarly I would not consider inconsistent to be the opposite of well integrated, rather I would interpret "well integrated" as being about the connectedness or the coherence of design style of the ontology.
- Page 6: Is "highly disagree" a good term? In most Likert scales I've seen that the term "strongly" is used and not "highly", is there a reason for using highly instead?
- Table 4: Is there a reason for the particular order of the questions? It is briefly discussed but I am still not sure exactly why this order was selected.
- Table 5: What do the rows signify? A single ontology? A version of a single ontology? It is not clear since several rows contain the same acronym under "ontology".
- The second paragraph of section 5 belongs in the introduction or background sections rather than in the discussion.
- Table 9 and text directly above: It is not clear why these questions are specific for ontologies that are to be used across different domains, i.e. why they are different from the kinds of questions in the other tables.
|