Automating modularisation with algorithms for abstraction and expressiveness

Tracking #: 1614-2826

Authors: 
Zubeida Khan
Maria Keet

Responsible editor: 
Bernardo Cuenca Grau

Submission type: 
Full Paper
Abstract: 
Large and complex ontologies lead to usage difficulty for both humans and software tools, hampering the ontology developers’ tasks. Modularity has been proposed as a possible solution to this problem and a number of techniques and tools for ontology modularisation have been developed in recent years. These algorithms and tools allow the developer to create only a subset of the types of modules they wish to create, employing principally partitioning and locality-based techniques. Different types of abstraction and expressiveness modules, on the other hand, still heavily rely on manual methods for modularisation. We propose here to fill this gap in modularisation techniques. We present five new algorithms to generate abstraction and expressiveness modules. They have been implemented in the NOMSA tool for modularising ontologies and were evaluated by both comparing it to other modularisation tools using a set of existing modules and assessing the quality of the generated modules. The results show that the algorithms’ performance is as good as others, whilst also eliminating manual intervention. The module’s quality ranges between average to good. Further, the algorithms are wrapped in an easily usable GUI to facilitate their use by ontology developers.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Nico Matentzoglu submitted on 02/May/2017
Suggestion:
Reject
Review Comment:

The article is concerned primarily with implementing and evaluating 5 modularisation techniques, which are embedded into a tool called NOMSA.

The main contributions of the paper are:

* The design and implementation of 5 algorithms for module extraction, all of which essentially consist of rules by which to mark a subset of the ontology to be deleted.
* The evaluation of those algorithms against a set of established module evaluation metrics.

The main positive points of the paper are:

* Relevant research area
* Few typos, reasonable paper structure, comprehensive bibliography

The main weaknesses of the paper are:

* Unclear nature of module evaluation metrics and unclear relationship to quality: "the smaller the better" only makes sense if you can provide logical guarantees. Else simply deleting all axioms would be the best.
* Unclear logical properties of modules. Do they have any logical properties ("guarantees")? How can I be sure I am not losing something important when using your algorithms? They don't seem to have coverage, self-containment, etc.
* What can we do with your modules? Can we do with your modules what you claim you can?
* Incomparable modularity techniques compared
* A number of holes in the definitions
* A number of holes in the experimental design
* Unclear extent of the contribution: The reader is not convinced that modules are "good quality".

Regrettably, while the subject may be of interest to the wider Semantic Web community, I cannot at this point recommend the paper to be published.

Before I state my review, I have to admit that it is difficult for me to accept the authors' definition of a module: Across all approaches, the only (logically relevant) *property* of a module is that of being a strict subset of the original ontology. That is a very weak property, compared to the logical guarantees that are for example given by most logic-based approaches (at the very least coverage, but also self-containment, depletingness and potentially subsumer-completeness). My feeling is that the better term to use here would be that of an "approximation", but I am not too sure of that either (are they from above, or below? Unclear.). However, I can see that it is not my job to give my opinion about alternative definitions of modules: The following review therefore assumes, despite my personal doubts, that, given the range of prior publications by the authors, the presented notion of modules (abstraction and expressiveness, both of which I have not heard about before) are well informed and have a valid use case.

Major points of potential improvement

Abstract:
- "Filling the gap" appears to be an extremely strong claim, which I would somehow clarify. It is obvious that your modules wont solve arbitrary use cases (consider where we do want to preserve logical properties), and if you believe your 5 algorithms serve significant proportion of the use cases, you need to establish that scientifically: Either through user surveys or by deriving the "gap" from existing accounts, such as a systematic review of modular techniques that includes a gap analysis (which, as far as I know, does not exist).
- You use here the notion of abstraction and expressiveness modules. My knowledge about modularity is probably above average (not expert), since my PhD was about modularity, but I have never heard of these two. Given that you are trying to present your work to a general audience (SW), consider explaining what they are, or avoiding to use the terms until you specify their exact meaning.
- "..quality ranges between average to good" --> What does that mean? What constitutes "good quality". I know you explain it later in the paper, but a good abstract should be, given the target audience, self contained.

Introduction
- ".. for a majority of types of modules, there is a heavy reliance on manual methods" -> perhaps a bit of an overstatement, if no citation is available. I am not sure whether a claim such as this can be really substantiated. Given its part in the motivation of your work, I would stick with the weaker claim that "a considerable number of ontologies are published in a modular fashion, many of which created by manual means" or something along these lines.
- Regarding abstraction and expressiveness modules: It would be really helpful if you included a reference to the originators of these modules types. I still do not really understand who defined these terms (was it you?)
- "They compare favourably to related work" -> In what way? Smaller, faster, better cohesion? Make explicit. However, as I will point out later again, you need to consider the different objectives of other modules, and consider whether they are really comparable.

Related work
- "either by abstraction, removal or decomposition" -> definition not self contained: The reader does not know what this phrase (exactly) means.
- "T is classified by a set of annotation features P" -> Reader of general audience (SW), indeed myself, will not be able to understand what this means
- "local correctness" -> not a term used in locality-based modules. The properties are, for example, coverage and depletingness.
- "Protege .. has a feature for generating modules..". I am not certain what you are talking about here. I have been using Protege for many years, but I am currently not aware of a built in module extractor. Perhaps I just forgot about it (or it was part of a much older version), but are you sure it was not a plugin of some sort? No matter the answer, I think some more details about the Protege implementation should help, in particular if we want to distinguish it from the OWLAPI syntactic locality module extractor. We simply need to know what kind of modular technique is hidden behind the Protege module extractor if you want to use it in your analysis. (also: almost the exact same sentence is repeated at the end of the next paragraph, which is redundant.)
- "query-based modules" -> What are those? A Google search gives 0 results..

Modularisation method
- "based on those identified in [23]" -> Without going to check [23] I think it would be helpful for the reader (reviewer) to learn at this point how much of what is presented here is novel, and how much was described in previous articles. For examples, were the AxAbs modules defined described in [23], or did you develop them for the work at hand?

Axiom Abstraction
- Definition is slightly broken: 1) S should be defined formally as the Set of all Subclass axioms in O that do not correspond to atomic subsumptions (at least if the algorithm pseudocode is any indication of what is going on). The definition and the pseudocode are not aligned.
- Pseudocode: Mention of declaration axioms is redundant. 1) They are logically ineffectual, and can be safely deleted without any consequence (perhaps short of reducing the signature of O), 2) They can never contain complex nested concepts. They will always be atomic. It is unclear however why the algorithm does not mention other axiom types that allow complex class expressions, such as class assertions or, more importantly, equivalence axioms.
- The cExpressionSet in the pseudocode is redundant, as "getNestedClassExpressions()" already returns a set. Isn't what you are trying to say here: If axiom contains complex nested expression, drop it? I think, in your simple scenario, if(ax.getNestedClassExpressions().size()>2) would have already done the job (unless you want to preserve A subclassof B and C, which would be in conflict with your definition).
- "that contains only the classes Professor and Course" -> This is indeed one of the biggest holes that your definitions have. It is not really useful to talk of an ontology as "containing a class". Ontologies comprise of axioms, which in turn can mention classes. You would be better off to define the notion of ontology signature in line with most of the literature, and say that A is still in the signature of O.
- It appears that this abstraction module resembles the class hierarchy (or at least a subset thereof). Why not simply extract the set of atomic subsumptions / equivalences instead of deleting axioms that contain properties?

Vocabulary Abstraction
- Again, an ontology is NOT a set of classes. Is does not make sense to say O' \cap C = O, unless you define C is the set of axioms containing a particular set of classes. This problem follows through the rest of the work: You say "remove class c", but you never really say what that means. Most likely, you mean: Remove all axioms mentioning C, but it should be mentioned explicitly somewhere. Other solutions are easily conceivable: A SubclassOf C and D -> A Subclass of C (in the case D is deleted)
- This seems like a quite "brutal" type of module, where it is hard for the reader to think of a scenario where it applies. Perhaps mention a convincing use case why anyone might want to delete all axioms containing classes. The example you mention is not convincing, as my ontology, in all likely WILL include some object properties I dont want to delete.

High level abstraction
- The description of this module type requires a bit of discussion regarding inferred or asserted depth. In particular, your pseudocode uses class.subclasses(), which is undefined for just the asserted knowledge (if you used a reasoner for this, of course the module extraction would be at least as hard has reasoning). Given an ontology with a single axiom ax="A subclassof C and D" -> how deep would your class graph be 1 or 2? Or A subclass B, B subclass A (1 or 2)? Or even worse: given an inconsistent ontology?
- How can one repeat lines 3-13 for instances? That makes little sense (instance.subinstance()?).
- Review the capitalisation of the pseudo code (levelNumber vs LevelNumber).

Weighted abstraction
- Note: What you call \mathcal{E} in your definition many people in the modularity community simply class the ontology signature. Maybe a nice word to use here.
- Definition 6: The need to order appears to be redundant.
- Your concern for "Declaration axioms" should be reviewed. They really mean nothing at all, I would not even consider them proper axioms. I would perhaps consider ignoring them all together.

Feature expressiveness
- "results in a simplified model of the ontology" -> What do you mean by "simplified model"? Please clarify. For example removing disjointness axioms results in an ontology for which a simple model may be much harder to identify!
- Your seven rules are not motivated well enough. It is not enough to state "motivated by the modelling perspective on language features" -> If you dont plan to generate empirical evidence for the viability of these rules, I would suggest selling these rules as some kind of examples, informed by your personal experience of modelling experience. However, I am not sure how your readers would react to that. You claim: This module extraction approach produces a module that is useful for ontology comprehension. Somehow, it would be very desirable to substantiate that claim with empirical evidence before presenting a tool implementing it.
- "We decided to assign lower points" -> decided by what principles?
- Definition 8: "a set of rules describing various OWL language features" -> phrase too vague. In order to understand the definition, substantiate.
- For all Rules: You need to motivate why a particular rule is beneficial. For example R1: Why would we care least about Qualified cardinality? Also, there is currently no relationship between your rules and OWL profiles. You should take a look at "approximations", such as the EL or QL approximation algorithms implemented by TrOWL. They are a bit more in line with what you are doing than logical modules.
- R1: Axiom patterns appear to be incomplete: what about A subclassof (B and min 2 R C)? Note also that you exclude axioms such as A sub R min 1 B, which are equivalent to A sub R some B.
- R7: C equiv D and E. So this axiom is not removed by R6? If so, say explicitly that equivalence (R6) is atomic equivalence.
- "Note that the algorithms are linear ... and quadratic" -> It might appear obvious to you, but whenever something like this is claimed, it should be proofed, at least to an extent that it convinces the average OWL expert.

Illustration of the algorithms
- "All of the classes, properties,... are preserved by the module" -> By chance! You could have easily constructed an example hat lead to an empty module. The information loss of your approaches is not really controllable. You should say somewhere why this is okay.

Implementation and evaluation
- "compared to other modularisation tools" -> Consider a better description of the other tools, in particular in terms of their intended use cases (they are not all there for ontology comprehension, like yours). Maybe provide a more detailed table.
- The metrics you choose here have a non obvious relationship to quality. We need to know what any of these metrics have to do with "Quality" (i.e. why is it better to be "smaller"? The smallest module is obviously the empty one, and that one is not the best possible one.)
- Similarly: Why is more or less cohesion good or bad? Why is intra-module distance?
- Atomic size, Appropriateness, intra-module distance, relative intra module distance, cohesion and inheritance richness are not sufficiently explained. For the sake of self containment, you should describe them in sufficient depth at least to understand what they can tell us (consider the target audience). We need to know how larger or smaller values constitute more or less quality in order to understand your evaluation.
- Your dataset: Insufficient description. You say you "derived" them from [12], but this is a very old paper, and there are many, many much more up to data corpora to consider here (a recent Bioportal snapshot, most importantly).
- Say something about size and expressivity distribution of your dataset.
- Since most of your extraction methods are parametrised (weights, axiom types etc), you should make explicit here which parameters you chose for the evaluation. In particular: VocAbs, but all the others as well.
- It is not obvious how you can compare locality-based modules and your modules. They serve different purposes.
- "For the level of interaction, NOMSA is automatic" -> What does that mean? Locality-based modules can also be extracted automatically.
- "NOMSA includes the most algorithms" -> Trivial: Any tool could easily implement NOMSA algorithms AND all the other ones, without much effort. That is not much of a benefit.
- "semantic-based abstraction" -> As far as I can see, you only ever do a syntactic-based abstraction. However, if I am wrong, you should make it clear how your algorithms are semantic-based abstractions.
- "language simplification" -> In what way?
- "manually save the modules" -> Trivial advantage. Any tool could do that easily! I would not emphasise such details, as they are of no interest to the scientific community.
- Since you are using TOMM, remind the reader what exactly it is and how it was validated.
- "all 114 ontologies were successfully modularised" -> Again, it is hard to see how this could not be the case given the low computational complexity of your algorithms. Was there ever any danger of not modularising?
- "All five algorithms result in reduction of size" -> True, but I can manually construct enormous ontologies where most of your methods would not result in a reduction of size, most trivially a flat list of 1000 classes.
- "meaning that the entities in the module are to that degree closer" -> Why is that a good thing? (Could be, just remind us)
- "we can compare the results.. to the benchmark dependencies between modularity metrics of the framework for ontology modularity" -> This sentence is not understandable. Compared to what? Clarify.
- You should contain a paragraph of these "expected values". I dont understand how they came about.
- This "Appropriateness" metric is unclear. What appropriateness function?
- "between 167-333 axioms" - ? really confused here.
- You say you "compare" your modules to other approaches. I dont find this comparision anywhere, except for extraction time. Time comparisons are only meaningful if the approaches are doing the same thing, which they dont.

Discussion
- Since the algorithms are mostly linear / quadratic, perhaps express the performance in terms of the input size, such as "on average 2 seconds per 1000 axioms".
- "sizes.. reduced..other metrics... notabl[y] different..." -> How is that good or better?

Conclusion
- "against the benchmark dependencies between modularity metrics" -> this phrase is really hard to parse mentally.
- You results for AxAbs, VocAbs and HLAbs-> Does this mean they are "bad" modules?

Minor corrections

Abstract
- "lead to usage difficulty" -> In what way? Ontology comprehension? Reasoning performance?
- "hampering the ontology developers task" -> Which tasks? Hampering?
- "performance as good as others" --> specify what this means (as good in what way?)
- Perhaps drop the last sentence of the abstract. Perhaps I am wrong, but a GUI for module extraction, all by itself, is merely a "nice to have", and perhaps even necessary for evaluation, but will almost certainly not count as a contribution (unless it enables further scientific studies).

1. Introduction
- 1 015 206 -> awkward spacing, consider using "," instead of spaces
- "such that modules can be recombined" -> Clarify, sentence did not use the word module before.
- "hide knowledge that is not required for the use case" -> cite
- "Modularity has been successfully applied to improve usability and assist with complexity" -> too vague, given the importance of the subject. Clarify.
- "Some of the many examples include" -> I lost the context: examples for what? Successful application of modularity?
- "manually saved...done automatically" -> Try not to undersell your approach by pointing out holly irrelevant implementational details. It would be trivial to write a tool that extracted and saved a module of any type.
- perhaps the motivational scenario could be moved a little bit earlier. However, even more convincing would a real scenario be, i.e. a biologist that told you: I need to know what this is about. Can you make it a bit smaller and show only the relevant parts? And an illustration how you solved their problems.

Related work
- "ORM module" -> MOst readers wont know what that is
- "Anchor points" -> What is that?
- "solves the ontology comprehension problem" -> Perhaps a bit strong. It wont solve the problem, for many reasons, most importantly that the module as a simplified subset of the ontology simply is not the original ontology anymore. It is, literally, something simpler. It will "aid in understanding"

Rest:
- You say a couple of times "notable" when you mean "notably"

Review #2
By Ernesto Jimenez Ruiz submitted on 12/Aug/2017
Suggestion:
Reject
Review Comment:

The paper deals with a very important problem in the Semantic Web community: the fragmentation and modularization of ontologies. There are several contributions in the literature but more efforts in this line are always welcome. There are however several issues that prevents me to provide a positive recommendation for the paper. I hope my recommendation does not discourage the authors but motivate them to improve the conducted work and pursue this research line.

- SUITABILITY AND NOVELTY. In the current state, I believe the paper would have been more suitable as a Tool/System paper since the main contribution of the paper is a series of algorithms (and their evaluation) and a prototype system. There is not clear research novelty with respect to the state of the art. As a system paper a might have opted for a Major Revision.

- AUTOMATION. One of the main motivations for the developed algorithms is the lack of (full) automation in state of the art methods and systems. I understand that locality-based modules require an input signature, but most of the partitioning algorithms can work in an automatic fashion. Expressiveness modules are also typically automatically computed in the literature if the target language is an OWL 2 profile (see [1,2,3]). Furthermore, the algorithms presented in this paper also require input parameters others than an ontology (thresholds, entities, axioms, weights, etc.) which seems to contradict what it is stated in the abstract and the introduction ("eliminating manual intervention").

- LOGICAL GUARANTEES. The computed modules may be useful for a given application at hand but as defined and implemented they do not provide enough logical guarantees.

* Definition of O. Depending on the abstraction type O is treated as a set of axioms or as as set of entities.

* Axioms abstraction module. The computed module is the result of removing axioms from a given set. The problem in the given algorithm is that the computation is rather syntactical. Removing an axiom "alpha" from O does not guarantee that O does not entail alpha any more. Consider the case, where alpha= "A sub B" is removed but the ontology still contains beta = "A sub B and C". Furthermore, entailment preservation is not considered. Consider O={"A sub B", "B sub C"}. The abstraction removes "A sub B", does the abstraction aims at preserving "A sub C"? This should be considered in the definition.

* Vocabulary abstraction. I understand that entities and related axioms are removed. As far as I know there are several approaches in literature about forgetting entities in DL (see [4-5]). I believe this type of module can be seen as a type of forgetting of entities. As above, if O={"A sub B", "B sub C"} and B is removed, does the abstraction aims at preserving "A sub C"?

* High-level abstraction. The method does not consider the cases where a class appears at two different levels (multiple inheritance). Has the ontology been classified before module extraction? What happens with axioms including classes A and B where depth(A) > n and depth(B) < n?

* Weighted abstraction. It also brings the preservation entailment issues as in Axioms and Vocabulary abstraction.

* Feature expressiveness. It would be more interesting to approximate to OWL 2 profile or to a sub-language useful for a given application. The rules also fail to consider entailment and they only apply syntactical simplifications of the ontology. Consider the case where O includes the axioms C sub A and =1R.B, module M1 after applying rule R1 would still entail "C sub =1R.B".

- EVALUATION. The evaluation lacks a section explaining how every tools has been set up. Each evaluated method/algorithm requires a set of input parameters (including the NOMSA ones). As it is now, it is not clear if the systems has been evaluated under the same conditions. It would be more interesting to provide numbers with respect to the modules to solve a given task, otherwise it is hard to compare the different approaches as they have different purposes and inputs of different nature.

Other comments:
- In introduction, Swoop, Protege and OWL module extractor are listed as automated tools. Protege and Swoop are better known as Ontology editors with some functionalities/plugins for modularization. More details about these modules/plugins should be given to understand their modularization capabilities.

- The second part of the definition of module in Section 2 is cyclic (and not very clear) since it requires from a set of modules. Does it mean that if we combine a set of modules we also obtain a module?

- The classification of abstraction and expressiveness modules with respect to the state of the art is not clear. Are, for example, locality-based modules abstraction modules? These modules apply both vocabulary and axiom abstraction to preserve only the entailments for the selected entities. As it reads now Section 2, it seems locality-based modules and query-based modules are of different nature.

- Definitions 5 and 6 are very similar. They could be merged in only one Defi9nition with two sub-definitions.

Minor comments:
- Abstract: ..create...create
- Introduction: 1 015 206 -> 1,015,206
- Page 5: set of class, object properties, data... in O -> set of entities in O

References:

[1] Effective computation of maximal sound approximations of Description Logic ontologies. ISWC 2014

[2] Soundness Preserving Approximation for TBox Reasoning. AAAI 2010.

[3] Is Your Ontology as Hard as You Think? Rewriting Ontologies into Simpler DLs. DL 2014

[4] Foundations for Uniform Interpolation and Forgetting in Expressive Description Logics. IJCAI 2011

[5] Concept and Role Forgetting in ALC Ontologies. ISWC 2009


Comments