Comparison of Reasoners for large Ontologies in the OWL 2 EL Profile

Paper Title: 
Comparison of Reasoners for large Ontologies in the OWL 2 EL Profile
Authors: 
Kathrin Dentler, Ronald Cornet, Annette ten Teije and Nicolette de Keizer
Abstract: 
This paper provides a survey to and a comparison of state-of-the-art Semantic Web reasoners that succeed in classifying large ontologies expressed in the tractable OWL 2 EL profile. Reasoners are characterized along several dimensions: The first dimension comprises underlying reasoning characteristics, such as the employed reasoning method, its soundness and completeness, the expressivity and worst-case computational complexity of its supported language and whether the reasoner supports incremental classification, rules, justifications for inconsistent concepts and ABox reasoning tasks. The second dimension is practical usability: whether the reasoner implements the OWL API and can be used via OWLlink, whether it is available as Protégé plugin, on which platforms it runs, whether its source is open or closed and which license it comes with. The last dimension contains performance indicators that can be evaluated empirically, such as classification, concept satisfiability, subsumption checking and consistency checking performance as well as required heap space and practical correctness, which is determined by comparing the computed class hierarchies with each other. For the very large ontology SNOMED CT, which is released both in stated and inferred form, we test whether the computed class hierarchies are correct by comparing them to the inferred form of the official distribution. The reasoners are categorized along the defined characteristics and benchmarked against well-known biomedical ontologies. The main conclusion from this study is that reasoners vary significantly with regard to all included characteristics, and therefore a critical assessment and evaluation of requirements is needed before selecting a reasoner for a real-life application.
Full PDF Version: 
Submission type: 
Survey Article
Responsible editor: 
Bernardo Cuenca Grau
Decision/Status: 
Accept

Comments

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Comments

Thank you for your constructive and helpful comments. We revised the paper, addressing all concerns. Please find the details below.

Review 1
========

There seems to be a problem in the definition of TBox consistency on page 7. The informal definition says that every concept should admit an individual whereas the formal one says that the ontology should admit a model. The latter is the correct definition. I can see that the former definition is taken from page 15 of the DL Handbook, but I still don't think it is correct.

Comment:
We changed the paragraph and use only the latter definition.

Also, check the fonts used on page 7: interpretations I and TBoxes T should be uniformly in the calligraphic font.

Comment:
We agree and corrected the fonts.

Finally, in the description of SNOMED CT on page 10, I would suggest to use the formal vocabulary defined earlier (e.g. general concept inclusions or full concept-definitions) instead of the meaningless phrase "active concepts with unique meanings and formal logic-based definitions".

Comment:
We changed the description into: "SNOMED CT consists of around 300,000 primitively and fully defined concepts."

Review 2
========

* Along the paper, the term 'class' occurs in place of atomic concept or concept name. For example 'class hierarchies' (in the abstract), 'class satisfiability' (page 3, col. 1, line 16), but also 'inconsistent concepts' (page 8, col. 2, line 42). In my opinion, there should be a very short explanation relating both terms: class and concept.

Comment:
We agree. Using the term 'class' in place of atomic concept or concept name might be difficult for the reader. We solved the issue by replacing all occurrences of the term 'class' with the term 'concept' The only occurrences of the term 'class' left are 'anonymous class definition', and sub- and superclasses, because in the experiments, we really query for sub- and superclasses. The paper contains the explaining sentence: "Concepts and roles correspond to OWL classes and properties."

* Page 14, last paragraph: 'CB is the fastest reasoner but ...', the 'but' should be 'and', since supporting a more expressive language is not a disadvantage.

Comment:
We agree that the 'but' is misleading. We did not mean it as a disadvantage, but in the context of the tradeoff between expressivity and classification performance. We changed it into: "CB is the fastest reasoner even though it supports a more expressive language than OWL 2 EL."

* Page 7, col. 2, line 34: the name 'Racer' should be 'RacerPro'

Comment:
We agree and changed it.

* Page 8, col. 1: 'Fact++' should be 'FaCT++'.

Comment:
Indeed, we corrected that.

* Page 12, col 2, last paragraph: '64 bit' and '32 bit' should be '64-bit' and '32-bit' respectively.

Comment:
We agree and changed it.

* Page 3, col. 1, paragraph 3: all the ontology acronyms are explained, with the exception of GALEN. I suggest to explain GALEN as well.

Comment:
We changed "Galen" into: "the Generalized Architecture for Languages, Encyclopaedias and Nomenclatures in medicine (GALEN)"

Additionally, we added a new entry to the related work (the article was released in December 2010):

"Recently, Mishra et al. [35] presented an extensive survey of nineteen reasoners that have been released between 1975 and 2009. The authors compare these reasoners with respect to their inference support, completeness and algorithm, implementation language and supported Semantic Web languages."

We did not make any changes regarding the other parts of the paper.

Response to the reviews

We like to thank both reviewers for their constructive and helpful comments. We thoroughly and substantially revised the paper, addressing all concerns. Please find the details below.

Firstly, I have noticed that there are sentences (in Section 3.1 even entire paragraphs) taken from published papers of other authors with only minor modifications and no reference to the original source in those places. This lies almost on the border of plagiarism. I realize that the paper is a survey and therefore this is perhaps not too serious, but I still don't think it is acceptable in a respectable publication. I have brought this matter to the attention of the editors.

Response:
We thank the reviewer for pointing us to our inaccuracy. We agree that the way sentences were formulated in the previous manuscript did not pay respect to the sources they came from. We now cited all quotations from published papers and re-formulated sentences from the web where appropriate. An example for an added citation is: Such procedures aim at large expressivity and, according to [24], classify an ontology by iterating over all necessary pairs of classes and trying to build a model of the ontology that violates the subsumption relation between them.

Secondly, it seems that some text was taken directly from the websites of the individual reasoners and the text sometimes doesn't even fit well with the rest of the paper. The descriptions of HermiT and Pellet in Section 4 sound very much like advertisements.

Response:
In order to provide the information that was given by the developers of the reasoners, we used text from their websites with little editing. We agree that outside of the context of a webpage these texts may look like advertisement and have thus re-formulated the descriptions, keeping their meaning but changing their tone. For example, we changed the description of HermiT into: HermiT (University of Oxford) [42] can determine whether or not a given ontology is consistent and identify subsumption relationships between classes, among other features. HermiT is based on a "hypertableau" calculus.

Section 5.1 contains the statement "TrOWL REL implements EL++ without datatypes" even though datatypes were not mentioned when EL++ was introduced on page 5. Similarly, the same paragraph says that FaCT++ doesn't support Top/Bottom Object and Data properties, keys and datatypes (from the FaCT++ website), which I find irrelevant for the comparison since none of these features are needed in the evaluation and are never explained in the rest of the paper.

Response:
We included datatypes when EL++ is introduced in Section 3.1. and removed the statement that "FaCT++ doesn't support Top/Bottom Object and Data properties, keys and datatypes (from the FaCT++ website)" from Section 5.1. as we agree that it is irrelevant for the comparison.

COMPUTATIONAL COMPLEXITY:
The paper seems to misinterpret computational complexity and can mislead readers who are not well acquainted with it. It suggests that the worst-case computational complexity of the language supported by a reasoner is a feature that has an immediate effect on the performance. Section 3.1 says roughly "worst-case behaviour occurs rarely in practice but, nevertheless, it can occur". It is perfectly possible that a reasoner for SHIQ or SROIQ terminates in polynomial-time on all EL ontologies. CB implements an extension of the standard EL algorithm, so on EL ontologies it behaves (in theory) exactly like CEL. Tableau reasoners use specific optimizations that can ensure polynomiality when the input is in EL. The conclusions of Section 5.4, i.e. that there is almost no trade-off between the expressivity of the supported language and the classification performance on EL ontologies, are therefore much less surprising than indicated.

Response:
We thank the reviewer for this comment. We adapted the paragraph on computational complexity in Section 3.1 to make it more clear that worst-case complexity does not have an immediate effect on the performance, especially with EL ontologies as input. Also, we changed the wording on our conclusions of Section 5.4. by deleting the word "Interestingly, so that they do not suggest any longer that the results are surprising.

SOUNDNESS AND COMPLETENESS IN PRACTICE:
In my opinion, most of the material on page 14 (and in Tables 10 and 11) is irrelevant for judging correctness of reasoners. It can all be summarized by saying that there is no standard specification for outputting the class hierarchy, some reasoners output several tautologic axioms, and therefore one cannot compare their outputs line by line. I don't think it is necessary (nor interesting for the readers) to repeatedly discuss the differences between the reasoners and the exact numbers of particular axioms for every test ontology.

Response:
We fully agree on this comment. We compared the outputs of the reasoners line by line, but for the reader a short summary is much more interesting. Thus, we significantly shortened this section and removed Table 10 and 11.

Also, the statement "all included reasoners are sound and complete for OWL2 EL in theory and (their next versions) also in practice" in the conclusion of the paper is too bold and not even correct. Firstly, there are features in OWL2 EL (e.g. datatypes) that are not in EL+ nor in Horn SHIF, so according to Table 2 at least CEL, SR and CB are not complete for OWL2 EL even in theory. Secondly, one can never do enough testing to prove that a program is correct in practice. All three test ontologies are in EL+ which covers only a subset of OWL2 EL, so some features of OWL2 EL were not tested at all. Furthermore, it is a bit naive to believe that a new release of an incorrect reasoner will fix 100% of its bugs.

Response:
We thank the reviewer for this comment. It is true that the statement in the conclusion is too bold and not even correct, so we removed it. We agree that one can never do enough testing, and added to our conclusions that "Ongoing testing is necessary to evaluate the correctness of reasoners in practice.".

ERRORS, TYPOS
- Section 3.1, Soundness and Completeness in Theory: There are two different definitions of correct in the same paragraph (correct = sound, correct = sound and complete).

Response:
--------
We corrected this and now use only the later definition correct = sound and complete.

- Table 1: You should spell out that the table shows complexities of concept satisfiability checking and not for example of concept subsumption checking.

Response:
We added this comment to the accompanying text and to the table caption.

- page 5: "nominals with one individual" is tautological. A nominal is always a single individual.

Response:
Right, we removed "with one individual".

- Table 2: Pellet's explanation should have "Fracture of lower limb" instead of "Fracture of bone".

Response:
Thank you, we changed it.

- Section 4: Does CB support Horn SHIQ or Horn SHIF? The text says the former but Table 2 shows the latter. Anyway, none of these is an extension of the full EL++.

Response:
We clarified this issue by adding: CB's reasoning procedure described in [24] supports Horn SHIQ, but its implementation supports only Horn SHIF, which is a subset of Horn SHIQ that does not include cardinality restrictions. We also changed the wording in 5.1. into "Most reasoners rely on tableau-based methods or on (extensions of) completion rules for EL."

- everywhere: FaCT++ is sometimes spelled as Fact++, sometimes as FacT++, sometimes just as FaCT

Response:
We spell it now everywhere as FaCT++, only two times as FaCT: once in the section on FaCT++ (FaCT++ "is the new generation of the OWL DL reasoner FaCT") and once in the literature review (CEL "has been compared to FaCT and Racer in 2005"). The reasoner is also referred to in the cited paper as FaCT without ++, and in 2005 the reasoner was probably not renamed yet.

Solicited review by Julian Mendez:

We thank the reviewer for his positive comments on our manuscript.

There are some minors corrections and observations listed below:

Typos:
> * Page 3, Column 2, Line 37, it says:
> "CB classified SNOMED CT in less then a minute, ..."
> and should say
> "CB classified SNOMED CT in less than a minute, ..."

Response:
Thank you, we corrected this typing error.

Suggestions:
* I suggest to use a different typeset for formulas (e.g. mathsf ), especially when they are part of a paragraph. Examples are in:
> Page 5, Fig 2
> Page 5, Column 2, Paragraph 1
> Page 6, Column 1, Paragraph 1
> Page 15, Fig 3

Response:
We agree and use a different typeset (textsl) now.

* I consider that:
Page 6, Column 2, Line 10, where it says:
"Protégé-4" should say "Protégé 4".

Response:
We changed that, too.

Clarifications:
I did not understand the following sentences:
* Page 4, Column 2, Line 13, it says:
"It is important to know whether a reasoner is sound and complete, but not always important that it is sound and complete."

Response:
We want to express that the user can gain speed by giving up soundness or completeness. In a scenario where speed is the main criterion, a reasoner might not have to be sound or complete, as long as it is fast. We believe that it is always important for a user to know how the reasoner behaves. We re-formulated the section so that it is more clear now: "Soundness or completeness can be sacrificed for a significant speed-up of reasoning [41]. Thus, to employ a reasoner in a real-world application, it is not always important that its underlying reasoning method is sound and complete, but it is important to know whether it is sound and complete.

* Page 7, Column 2, Line 40, it says:
"Horn SHIQ ontologies are ontologies expressed in the DL SHIQ, which do not contain 'non-deterministic' constructors, ..."

* Page 8, Column 1, Line 29 it says:
"The main aspect of this algorithm is that is less non-deterministic that other ..."

Response:
As we agree that it is unclear what "non-deterministic" means, and as it is irrelevant in our context, we removed both sentences.

I did not understand the purpose of Table 11.

Response:
As mentioned in a response to the first reviewer, we removed the table. We agree that it was irrelevant in the scope of this paper.