Ontology Usability Scale: Context-aware Metrics for the Effectiveness, Efficiency and Satisfaction of Ontology Uses

Tracking #: 1183-2395

Linyun Fu
Xiaogang Ma
Patrick West

Responsible editor: 
Aldo Gangemi

Submission type: 
Full Paper
Both ontology builders and users need a way to evaluate ontologies in terms of usability, but existing ontology evaluation approaches do not fit this purpose. We propose the Ontology Usability Scale (OUS), a ten-item Likert scale derived from statements prepared according to a semiotic framework and an online poll in the Semantic Web community to provide a practical way of ontology usability evaluation. Case studies were conducted to bookkeep current usability evaluation results for ontologies expecting revisions in the future, and discussions of the poll results are presented to help proper use and customization of the OUS.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Paul Warren submitted on 13/Nov/2015
Major Revision
Review Comment:

This paper is attempting to do something valuable, namely construct an ontology usability scale genuinely oriented around users. However, I believe that it is not there yet and that a significant amount of work remains to be done.

To me, the approach you use lacks originality. The methodology used appears to have been to ask users to rate 26 usability-based questions. One problem with this is that respondents might rate highly a number of essentially dependent characteristics. I think what one really wants to achieve is an understanding of just what independent factors contribute to usability of an ontology.

Related to this, I am not in general happy about systems which take a sum of scores over a number of dimensions and then produce one number. Better to reduce the dimensionality to a minimum set of independent dimensions.

I would like to make an analogy with the 'Big 5' personality factors which psychologists use. The point here is that psychologists have taken a large number of personality quesions and used factor analysis to reduce them to 5 independent ones.

In this context, I would like to see one or more ontologies scored by a number of people against all the possible characteristics, i.e. the 26 questions you have here plus others suggested by respondents. With that data you could do a factor analysis to identify the independent dimensions. Then you could can divide the questions between the dimensions and generate a questionnaire which arrives at a score for each dimension.

A few other points:
* page 2, second col. "we provide ... around 10 items". This reads strangely (was it 10, or fewer or more?). I think if you say "we aimed to provide" it makes more sense.
* page 3, top of column (i.e. end of section 2). Would by valuable to describe the differences.
* at top of page 3 there is a quote - this doesn't seem to have a reference.
* you could have sent out the questionnaire to a lot more groups, e.g. ontolog-forum, various W3C groups etc.

Generally the English needs some improvement. For example, there are a few very long sentences, bottom of page 1 to top of page 2. The paper needs to be read through again very carefully. For example page 10 talks about 'feedbacks'. I don't think feedback is normally used in the plural.

Review #2
By Eva Blomqvist submitted on 13/Jan/2016
Major Revision
Review Comment:

The paper describes a proposal of an Ontology Usability Scale, which corresponds to the System Usability Scale (SUS) for software systems, but "reinvented" for ontologies. It is based on a set of questions originating in existing ontology evaluation and reuse approaches, and a selection among them is made based on a small survey. The proposed evaluation questions are then applied in what is by the authors called a case study, and a quite extensive discussion is included on the motivation and potential alternatives/modifications of the selected questions to include.

The paper is well-written and easy to read, and the effort is certainly motivated and worth-while, i.e. a significant contribution. To have something corresponding to the SUS for ontologies would provide considerable practical benefits, and it could be used in many scenarios, such as as "rating criteria" in online ontology repositories or when testing newly built ontologies. The work is, as far as I can judge, original.

However, several aspects of the paper leads me to still conclude that this is rather early results, that would benefit from a bit more work before being published in a journal article. Nevertheless, considering that some time has passed since the paper was submitted, and that this is not a question of additional development work, I would expect that the authors already have more results to include regarding the more specific issues described below, and could therefore rather easily submit a revised version with these additions.

First, I have one general concern with the paper, which is the lack of definition of the terms "user" and "use". What does it mean to use an ontology? When I use ontologies it is usually for expressing some dataset with the classes and properties of that ontology, and publishing it or load it in some storage and management system. I also use ontologies to reason over data, i.e. maybe those data that I just loaded in my storage facility. Then I use the ontology, usually wrapped by some API, as a component, when I build the rest of the software system that will operate on the ontology and data, and use it to query my data. Finally, the end user of that system will (indirectly) use the ontology, as well as all the other software components of the system, when using the system. Which of these, or other, usage scenarios are you targeting? One? All? Can usability really be defined the same way if the usage is to only express data using the ontology as a vocabulary, or if the usage is to build a software system that uses the ontology as its knowledge representation for reasoning, or when considering end users' indirect usage? I am not so sure. Similarly, who is the user that assesses the usability? Is it the data engineer that wants to transform data into this new vocabulary? The software developer that produces the system? The end user who uses the system? One of them? Any or all of them? Does it make a difference? I would think so. Basically, the first thing the authors need to do is to 1) define the concepts "user" and "use" as used in their paper, since this affects what can be read into the term "usability" (i.e. usability for what?), and 2) describe any assumptions/limitations that are made regarding what users and usage that is considered in their work.

Although the process of developing the proposed evaluation question set is described quite in detail, it still feels a bit ad-hoc in some sense. The authors do not motivate each step clearly, although what is done is described. For example, first questions are changed to positive forms, then changed back later. Although some discussion on this is included, it is not really clear to the reader why this is necessary if the negative form is anyway going to be used at the end, and the reader is not assured that this translation back and forth would not impact the results (e.g. a "bad" translation could have made less people select that question in the survey). The survey is also quite small, and the subjects do not seem to be randomly selected (within a population of ontology users), but rather the paper hints at them being known associates of the authors, although from different institutions. This may be fine, but there should be a discussion about this, the potential bias introduced, and why the authors think that exactly these people were a representative set of subjects for the survey.

Another serious issue is the so-called case study and evaluation (section 4). The section is very short, and describes how the OUS has been used to evaluate a set of ontologies in some set of ontology projects. However, the only thing that is presented in the section is a table of numbers. On its own this does not say much about the proposed OUS. It basically only says that it could be used, but if the scores reflect some notion of actual usability, or even whether the subjects were satisfied with the OUS is not discussed at all. For this to be called a case study, or even evaluation, then there has to be some results confirming that the OUS is in some sense "correct", i.e. able to reflect some notion of usability, or at least subjectively thought of as useful by someone. Ideally, the authors would also extend the "case study" to actually include the second part that is mentioned, i.e. the "after" evaluation, after the revision of the ontologies. However, the most important thing is still to be able to draw some conclusions from the study regarding the quality of the proposed OUS - can we truts the results of this set of questions? Will the results be useful for selecting ontologies?

Minor issues include:

- Page 3, first paragraph: I am not sure that I agree that just because an ontology has been used more and/or has been around longer, it should obviously be preferred. This would mean that one could never publish an improved version of anything!

- Table 1: I am not sure about the terminology here. I would not call the syntax part "content", for instance. For me, content would be the actual concepts and relations, i.e. more related to the conceptualization than the structure. In bullet 4 "complex" is transformed to "brief", and although I am not a native English speaker and could be wrong, these do not seem to be opposites to me. Similarly I would not consider inconsistent to be the opposite of well integrated, rather I would interpret "well integrated" as being about the connectedness or the coherence of design style of the ontology.

- Page 6: Is "highly disagree" a good term? In most Likert scales I've seen that the term "strongly" is used and not "highly", is there a reason for using highly instead?

- Table 4: Is there a reason for the particular order of the questions? It is briefly discussed but I am still not sure exactly why this order was selected.

- Table 5: What do the rows signify? A single ontology? A version of a single ontology? It is not clear since several rows contain the same acronym under "ontology".

- The second paragraph of section 5 belongs in the introduction or background sections rather than in the discussion.

- Table 9 and text directly above: It is not clear why these questions are specific for ontologies that are to be used across different domains, i.e. why they are different from the kinds of questions in the other tables.

Review #3
Anonymous submitted on 09/Feb/2016
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

This paper deals with evaluation of ontology uses from a user-centered point of view in regards to the goal and the context of the use of the ontology. More specifically, it deals with specific metrics to evaluate the use of an ontology and not an ontology as such, what makes it original.

The issue is well-explained and clear. The aim of the paper is well-positioned in regards to related work. This outlines the key characteristics of the desired evaluation : a multi-criteria approach with a single numerical score computed from degrees of agreement of few criteria, and which can be used by any user, not necessarily ontology experts.

The proposal is based on existing work : (1) the semantic framework of Gangemi et al. using a semiotic meta-ontology to select criteria but with a different understanding of syntax, semantics and pragmatics, which characterize groups of criteria (2) the System Usability Scale by revising the statements in the questionnaire. To obtain a reasonable number of statements in the questionnaire, a poll has been submitted to the Semantic Web community. The answers were votes indicating how representative the criteria are. The 10-item Likert scale for ontology usability evaluation is based on the results of this poll. The chosen items are the ten criteria with the highest scores.

An evaluation using the proposed approach was conducted. The aim was to evaluate the use of an ontology chosen by participants.

This step of the evaluation phase is not very clear.

Was the choice among a set of proposed ontologies (participants are not entirely free to choose an ontology)? If it was, this should be said.

How many participants were there ?

How do we know the way participants use the evaluated ontologies (goal and context) ? Is that knowledge represented ? How ? The authors say that they are able to make comparisons between ontologies with similar intended uses. These uses must clearly be known. It is also needed to analyze the results of each evaluation.

Furthermore, the purpose of an ontology and the goal of its use are not quite the same thing.

Effectiveness, efficiency and satisfaction are evaluated in a global way. One statement in the questionnaire is relative to a semiotical (syntax, semantics and pragmatics) aspect which has effects on each usability aspect both in terms of effectiveness, efficiency and satisfaction. However effectiveness, efficiency and satisfaction are not evaluated in a separate way. This does not provide very precise metrics.
The overall character of the evaluation is still reinforced with the notion of a unique score. The unique score gives an overall overview on the ontology use but it is a user-specific and subjective score. What is its value and how can we interpret it if we do not know the corresponding goal and context ?

According to the authors, the results can be used to improve the ontology. However, if changes are made to modify the ontology in favor of a particular use, the ontology might then be less adapted for other uses. Changes are potentially dangerous. In fact, ontologies should be general enough to maintain their use (with adaptations such as specializations) in various applications. For me, the issue is to find the good compromise between generality and applicability. This point must be discussed.

Ontology evaluation from a user-centered point of view is an important issue. A proposal is made for a questionnaire able to make the intended evaluation. An evaluation has been conducted. The analysis of the results outlines some aspects that are interesting but the authors should have gone further in order to make concrete proposals based on these feedbacks. Such concrete proposals are missing.
More case studies will be explored in future works. Different statements will be proposed according to the type of ontology (upper ontology or domain ontology). Thus, feedbacks presented in this paper appear to be the result of preliminary work that has to be further developed.