The Understandability of OWL Statements in Controlled English
This is a revised manuscript, now accepted for publication, following an accept with minor revisions for the original manuscript. Reviews for the original submission are below.
Review 1 by Glen Hart
I had concerns that the paper would be too similar to the author's previous paper presented at workshop [13]. The author explicitly acknowledges this paper states why this paper is different. I was please to see that the paper meets the author's claim and is sufficiently different from the workshop paper. In deed I would consider this paper to be asignificant improvement on that paper.
The ontograph method described by the paper is quite novel and deserves further examination.
The paper makes the point that a simplified version of Manchester Syntax is used as it is not possible to easily compare simplified full Manchester statements with ACE. This is true and from this perspective the decision seems wise. However, in the real world all statements will be presented in context and both are capable of expressing complex statements. I am left wondering whether more detailed statements that would provide a more realistic comparison would not be too complex to be easily expressed as ontographs. I think this thought is unreasonable to test in the work presented but some short view from the author on this would be welcomed.
If I have a criticism of the paper it is that the justification for using students as participants verges on special pleading and omits one of their greatest assets - availability! The paper states that most students are not familiar with formal logic unless this lies within their field of study. What is not stated is whether students with knowledge of formal logic were included or excluded. It would also be nice if the subjects of the students were identified in the paper – very different results might be obtained between history students to those studying computer science. The implication is that the disciplines were diverse, but how diverse?
Generally a well thought through and executed piece of research.
Review 2 by Beryl Plimmer
This paper presents a study to compare user learning time and understanding of Controlled English versus OWL statements. It does so in a controlled experiment. The logic is presented using a simple diagrammatic notation. The participants are presented a number of statements, in either CE or OWL which they must then check against the diagram and then mark as true, false or don't know.
In general this is a very nicely thought out piece of work and worthy of publication. However I have a number of comments, questions, suggestions.
1. In the introduction it says that this work is part of the authors thesis – it is unusual to say this, such a paper should stand alone – I would simply remove this statement.
2. It is not clear to me how ontographs p3 "In contrast to other approaches, good performance of the participants of an experiment implies understanding, at least in a certain narrow sense of the word." this section goes on to try to justify this - I am not convinced. The last paragraph of this section is very dense English/Logic which, rather than supporting your claims further confused me. You need to either reference some supporting material for this section or reduce your claims.
3 5.2 has a link to the actual diagrams used and their related statements- but none of these are in the paper - please include at least a couple of examples in the paper. It would help the reader immensely to have some concrete examples in this section. fig2 & 3 are useful in this respect and could be referenced from 5.2, but both are of ACE.
4. While you have shown to my satisfaction that ACE is easier for novices to read and understand, I would like to have seen a broadening of the implications in the conclusions. I suspect that if people had more training/ practice then the performance would level out, what is your opinion? You are testing applying a statement, would the same hold true for constructing statements? OWL has advantages for computation of the logic, what is lost using ACE? I.E, what are the pros and cons.
Review 3 by Paul Smart
The paper describes and experiment designed to evaluate humans' understanding of a controlled natural language relative to a simplified form of Manchester OWL Syntax (MOS). Given the current interest in controlled natural languages as an interface language for the Semantic Web, this is an important area of study, and it is also one that is relevant to the Semantic Web journal.
The paper is well-written, and the description of both the experimental procedure and study results is clear.
I have only a few comments relating to methodology and analysis.
I am assuming that the statements in figure 2 and 3 are in ACE. It would be helpful to indicate this in the figure captions.
In terms of the learning time analyses (Section 6.2), it would be interesting to see whether there are any differences between the learning and testing phases. The data from the two phases are combined in the paper. The section would benefit by the inclusion of a couple of sentences indicating whether there were any differences between the experimental groups in terms of time values for learning and testing phases.
Given the number of statistical analyses that are performed in the paper, the author needs to indicate whether any steps need to be take to address inflated type 1 error concerns. It is important to indicate (perhaps in the table 3 caption) whether the results would be different if an appropriate adjustment had been made to the confidence interval.
Comments
Response from Author
I am very pleased to read your mostly positive reviews. Many thanks for your helpful comments!
I would like to comment on a few points:
Glen Hart: The paper makes the point that a simplified version of Manchester Syntax is used as it is not possible to easily compare simplified full Manchester statements with ACE. This is true and from this perspective the decision seems wise. However, in the real world all statements will be presented in context and both are capable of expressing complex statements. I am left wondering whether more detailed statements that would provide a more realistic comparison would not be too complex to be easily expressed as ontographs.
Response: The simplifications applied to the Manchester OWL Syntax are syntactic simplifications, not semantic ones. While not all OWL features were tested (there are too many), the tested statements were quite complex, compared to what can normally be found in existing OWL ontologies. Of course, understanding a large number of statements or a complete ontology in a complex context is much harder and involves additional mental processes than just classifying one statement. However, this has in my opinion nothing to do with the kind of simplifications I applied to the Manchester OWL Syntax.
Glen Hart: If I have a criticism of the paper it is that the justification for using students as participants verges on special pleading and omits one of their greatest assets - availability!
Response: I actually did mention availability as one of the reasons of taking students as participants: "Apart from the fact that students are usually flexible and close to the research facilities of the university, there are more reasons why students are a good choice in this particular case."
Beryl Plimmer: It is not clear to me how ontographs p3 "In contrast to other approaches, good performance of the participants of an experiment implies understanding, at least in a certain narrow sense of the word." this section goes on to try to justify this - I am not convinced. The last paragraph of this section is very dense English/Logic which, rather than supporting your claims further confused me. You need to either reference some supporting material for this section or reduce your claims.
Response: I rewrote the passage, see below. Is it clearer or still confusing?
In contrast to other approaches, good performance of the participants of an experiment implies understanding, at least in a certain model-theoretic sense of the word. Ontographs can be considered a language to describe first-order models: the individuals shown in the mini world represent the domain elements; their icons represent the interpretation of the unary predicates; and the arrows represent the interpretation of the binary predicates. The statements shown to the participants of an experiment correspond to simple first-order theories (or theories in any other kind of logic based on model theory). From this point of view, the task of the participants is to decide whether or not certain theories have the shown ontograph as a model. Now, all semantic properties of first-order theories (consistency, entailment, equivalence, etc.) are solely defined on their models, and these definitions are very simple when the mapping between theories and models is taken for granted. For example, two statements are equivalent if and only if they have exactly the same models. It is thus sensible to say that an agent --- computer program or human --- understands a certain logic language if the agent is able to correctly map theories to models.(FOOTNOTE: Based on the same idea, Bos [3] suggests to use "Textual Model Checking" as a way to evaluate NLP systems.) So, participants performing well in ontograph experiments (systematically, not by mere luck) prove that they understand the core aspect of the language, i.e. the mapping to models.
[3] Johan Bos. Let’s not Argue about Semantics. In Proceedings of LREC’08, pages 2835–2840. ELRA, 2008.
Paul Smart: Given the number of statistical analyses that are performed in the paper, the author needs to indicate whether any steps need to be take to address inflated type 1 error concerns. It is important to indicate (perhaps in the table 3 caption) whether the results would be different if an appropriate adjustment had been made to the confidence interval.
Response: Thanks for pointing this out. I was aware of the problem, though not of its name "type 1 error inflation". I could not find any reliable document that could tell me, whether or not I have to care about "type 1 error inflation" in my particular situation. I think, however, that it is not a problem in my case, because the performed tests were pre-planned and straight-forward. There was no "fishing expedition", where only the successful tests out of many made their way into the paper. Please correct me if I am wrong and "type 1 error inflation" needs to be accounted for in my case.