Discovery of Emerging Design Patterns in Ontologies Using Tree Mining

Tracking #: 1459-2671

Agnieszka Lawrynowicz
Jedrzej Potoniec
Michal Robaczyk
Tania Tudorache

Responsible editor: 
Rinke Hoekstra

Submission type: 
Full Paper
The research goal of this work is to investigate modeling patterns that recur in ontologies. Such patterns may originate from certain design solutions, and they may possibly indicate emerging ontology design patterns. We describe our tree-mining method for identifying the emerging design patterns. The method works in two steps: (1) we transform the ontology axioms in a tree shape in order to find axiom patterns; and then, (2) we use association analysis to mine co-occuring axiom patterns in order to extract emerging design patterns. We conduct an experimental study on a set of 331 ontologies from the BioPortal repository. We show that recurring axiom patterns appear across all individual ontologies, as well as across the whole set. In individual ontologies, we find frequent and non-trivial patterns with and without variables. Some of the former patterns have more than 300,000 occurrences. The longest pattern without a variable discovered from the whole ontology set has size 12, and it appears in 14 ontologies. To the best of our knowledge, this is the first method for automatic discovery of emerging design patterns in ontologies. Finally, we demonstrate that we are able to automatically detect patterns, for which we have manually confirmed that they are fragments of ontology design patterns described in the literature. Since our method is not specific to particular ontologies, we conclude that we should be able to discover new, emerging design patterns for arbitrary ontology sets.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Vojtěch Svátek submitted on 15/Oct/2016
Review Comment:

I acknowledge a very careful revision by the authors. Nearly all of my comments from the previous review have been adequately addressed. Particularly I appreciate the comparison between the authors’ approach and the RIO tool.

Overall, I see the paper in good shape for publishing. To summarize along the main review axes:
- Originality: I find the problem addressed (mining sensible patterns from ontologies, well balancing the degree of domain entity generalization) not yet sufficiently tackled, and the approach taken is novel in many respects.
- Significance of the results: The output of the method could be practically useful for ontology designers, tool developers and users. As the authors notice, future work should involve expert users in evaluation of the mined fragments.
- Quality of writing: acceptable.

Minor remarks to consider for the final version:

In the previous review I noted that the tree2axiom ‚decoding‘ algorithm might be intuitive but completely omitting it is not a good thing… You replied that “the algorithm is (and was) described two paragraphs below”. Actually, what I had in mind was the (presumably, quite simple) algorithm for translating the string representation to the standard Manchester syntax of axioms (sorry, “tree2axiom” was probably not a good nick).

I am still a bit unhappy about the arrangement of *all* tables into the appendix, including modestly sized ones that could easily be positioned into the text and thus improve readability. Consider moving some small ones (extending for less of half-a-page) to the text. The big ones are rightfully in the Appendix. However, I believe there should not be the same consecutive numbering of tables in the paper and its appendix. Rather I would prefer having thematic appendices (Appendix 1, Appendix 2, etc.) containing the big tables (possibly clusters of closely related ones), and referring to them as to these appendices.

As regards the rewritten sentence “To deal with the first issue, we keep, and extend only the induced subtrees.” – OK, but then there should be a comma either both before and after “and extend”, or none. Otherwise the sentence still looks ill-formed.

Table 8 is actually just a listing of a pattern, not a table.

Review #2
By Eva Blomqvist submitted on 15/Oct/2016
Review Comment:

First of all, thanks to the authors for considerably improving the readability of the paper. In principle all my concerns have been addressed, only a few details remain (or stood out now that the rest of the paper was more clear and more easy to read). Therefore I suggest that the paper is accepted, since the authors just need to fix a few small details that would not require a new round of reviews to be included.

I have the following minor comments:

- Since you have a quite detailed comparison with RIO at the end, you could refer to it already from section 2.

- In the grammar notation in equation 2 the arrow points to the left, shouldn't it be pointing to the right, i.e. read out as that Ca can be replaced by any of the things to the right? Maybe this is just an insignificant detail, but to me the current direction reads as that you can replace a complex thing with a C in one step, but what you are after is actually the "recursive" construction of a complex structure, but I may be overly picky here (or even wrong).

- I still have a bit of a problem with some of the definitions, or rather with how they are presented. In definition 3.1 and 3.2 Q is an axiom pattern (singular), while in 3.3 Q is a SET of axiom patterns. If one is a single pattern and the other is a set of patterns, why reuse the same letter for that? Also, in 3.1 and 3.2 the superscript alpha represents the "of an axiom alpha" part of the definition, i.e. alpha is a possible instantiation of the pattern. Whereas in 3.3 the Q has a superscript CF that I assume stand for class frame, but that is not mentioned inside the definitions (merely earlier on that page), so then I assume that you mean that the class fram pattern is a pattern "of a class frame CF"? Previously in the text you then have a subscript A on the CF, which denotes the class, but here in 3.3 this now appears on the Q instead. Next, Q appears again in definition 3.5, as an ontology pattern - why? You already have OP to denote that. Nothing here is major, it is just that it is still not entirely clear and straight forward to read these definitions, although it is much better than in the previous version.

- I do understand that the authors want to focus on the things that their algorithm indeed does, and not what it cannot do. Nevertheless I still find the discussion in sections 5.4, question 4 under 6.1, and 6.6 a bit weak in the sense of not mentioning anything about what was not detected and why (focusing on the documented patterns). A few documented patterns were indeed detected, and that is good news that should of course constitute the major part of the description. However, it can say a lot about the nature of the detection to also take just one or a few negative examples, i.e. of a documented pattern that you can identify manually in the ontology with the help of the paper that describes it, but that was not detected by the algorithm. It should probably be quite obvious from a manual inspection why such a pattern was not detected, and could give the reader some valuable insights into the limitations of this approach.

- Thanks for adding the running example of Table 1, it helps a lot. However, it is still not entirely clear throughout the paper that this is what is used. You only refer to Table 1 in a couple of places, and often not to the exact axiom in one of the ontologies that you reuse for the example. I suggest to go through all the examples and make sure you refer back to Table 1, and also mention the # of the axiom (as you do in some cases) wherever applicable. This will make the use of the example more consistent.

- Figures 5 and 7 are still quite "anonymous" and do not use the running example. Is this for readability reasons? What would happen if you would use "real" node names also here? It is not crucial, but it would be easier to follow, than keeping track of a, b, c:s.

- It is great that you now discuss the various pattern types towards the end of the paper. However, this raises additional questions: You mention that you can detect some logical and alignment patterns as well, so do you have some examples where this happened, from your experimental results? Or at least where it could happen?

- Figure 14 is still a bit hard to read. Would it fit better as a table instead (at least part b) since it has so many subparts within?

Overall a nice paper that I am happy to have reviewed!

Review #3
By Heiko Paulheim submitted on 24/Oct/2016
Review Comment:

I appreciate that the authors have taken the numerous remarks raised in my review for the previous version (and the other two) very seriously. The revision is really thorough and substantial.

In the current version, the paper presents a solid piece of research, and I recommended its acceptance.