ABox abduction algorithm for expressive description logics

Tracking #: 1979-3192

Júlia Pukancová
Martin Homola

Responsible editor: 
Guilin Qi

Submission type: 
Full Paper
We develop an ABox abduction algorithm for description logics based on Reiter's minimal hitting set algorithm. It handles abduction problems with multiple observations and it supports the class of explanations allowing atomic and negated atomic concept and role assertions. As shorter explanations are preferred, the algorithm computes shorter explanations first and allows to limit their length. The algorithm is sound and complete for this class of explanations and for any given maximal length of explanations. To improve optimality, we include and even slightly extend the pruning techniques proposed by Reiter. The DL expressivity is limited only by the DL reasoner that our algorithm calls as a black box. We provide an implementation on top of Pellet, which is a full OWL~2 reasoner, so the expressivity is up to SROIQ. We evaluate the implementation on three different ontologies.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 15/Sep/2018
Major Revision
Review Comment:

The article proposes methods for computing abductive explanations for single observations and for multiple observations, respectively, in ABox abduction with arbitrary description logics. The basic idea for computing an abductive explanation for a given observation is that, an abduction explanation can be computed from a hitting set of the collection of all ABox encoding models of the union of the ontology (TBox+ABox) and the given observation by negating every assertion in the hitting set. Based on this idea, the article proposes a hitting set tree algorithm for computing abductive explanations with bounded cardinality for single observations. To handle multiple observations, the article also proposes two methods, where one (the reduction-based method) is based on a reduction to a problem about a single observation, the other (the combination-based method) is based on combining abductive explanations for respective single observations. Some experiments on three small ontologies were conducted to show the running statistics for computing single observations and to show the comparison results about the two methods for multiple observations.

In general, the originality is partially good, but the significance of the results and the quality of writing need to be improved much. More details about these statements are shown below.

(1) The basis idea for handling single observations based on hitting set trees is new compared with related work, though it has been unofficially published in the DL 2016 workshop.
(2) The algorithm for computing abductive explanations with bounded cardinality for single observations is new, but more justifications for its technical advantages need to be provided. See more details in the following review comments on significance.
(3) The combination-based method for handling multiple observations is rather straightforward and has only been mentioned without details nor empirical evaluation in previous work on ABox abduction. The article provides empirical results on this method.
(4) The reduction-based method for handling multiple observations is new compared with related work, but more justifications for its technical advantages need to be provided. See more details in the following review comments on significance.

Significance of the results:
(1) The basis idea for handling single observations based on hitting set trees works for arbitrary description logics that have corresponding reasoners. However, the feasibility of a method implementing this idea is only partially demonstrated by three small ontologies (one having <300 axioms and the other two having <50 axioms). The feasibility needs to be further clarified by larger ontologies, such as those ones with up to millions of axioms created by the LUBM generator, which have been used in experiments conducted by related work [7,8].
(2) The computation of abductive explanations with bounded cardinality is a special problem setting different from traditional ones. It is unclear what this setting results in with respective to the computational complexity. In particular, it is unclear whether this setting results in intractability for tractable description logics. According to Figure 1, the execution time for computing all abductive explanations with cardinality<=5 is already extremely long (>1,000,000s) for LUBM with only 46 axioms. This result actually means that the proposed algorithm (Algorithm 1) is not practical or even infeasible for large ontologies. According to Table 3, which shows that the algorithm only computes 20 abductive explanations in more than 1,000,000 seconds, I conjecture that the proposed algorithm has much room to improve its efficiency and scalability. As a word, the proposed algorithm or even the proposed problem setting may not be sufficiently reasonable, especially in term of scaling to large ontologies.
(3) An empirical comparison study on the two methods for handling multiple observations is provided in the article. The comparison results show that the new reduction-based method is not necessary more efficient than the straightforward combination-based method. This raises a question about the advantages of the proposed reduction-based method. In fact, the reduction-based method increases the expressivity of the observations by introducing nominals. It is unclear whether this reduction increases the computational complexity of the abduction problem for description logics without nominals. More theoretical statements on comparing the two methods are needed to clarify the pros and cons of the two methods.

Quality of writing:
The writing is generally good but still has a number of issues. The main issues are listed below. There are also relatively minor issues on wording, which need to be checked by authors.
(1) Different methods on ABox abduction actually have different problem settings. These differences should be clarified during the discussion of related work.
(2) A_nR_n used in Definition 5 is an abnormal representation for function symbols. A function symbol should be short and has only subscripts at tail.
(3) The function TA(K) declared in the article is nondeterminate. A normal function should be determinate; i.e., it should return a unique result for any input. In this sense, it is inappropriate to declare TA as a function. Instead, we can say that the symbol TA stands for the ABox encoding of something.
(4) In Definition 9, {\neg M| M\in\mathcal{M}} represents only one set but not a collection of sets.
(5) In the implementation section (Section 6), the term “loops” is used without definition. Moreover, it is unclear why authors say that forbidding loops significantly reduce the search space.
(6) Table 2 and Table 3 report the statistics other than the parameters.
(7) In many places the execution time is said to be high or low. This wording is wrong as the execution time is often said to be long or short.

Review #2
Anonymous submitted on 01/Oct/2018
Review Comment:

This paper presents an ABox abduction algorithm for all the description logics which are supported by DL reasoners. In order to get such an algorithm, the authors put some limitations on the abduction problem. First, both observations and explanations of the abduction problem are limited to be ABox assertions. Secondly, the minimality relationship between explanations are measured by the subset relationship instead of the logical implication. Based on these limitations, it is not difficult for the authors to design a sound and complete abduction algorithm by making use of Reiter’s minimal hitting set algorithm.

I think there are two contributions in this paper. Firstly, it presents a relatively complete solution for the ABox abduction problem of description logics in a special case, and evaluates the actual performance of this solution on three different ontologies. Secondly, the maximal length of explanations is used as a parameter in the algorithms, so that the actual performance of algorithms can be improved significantly by setting such a limitation.

This paper is well written. All the theorems and corollaries presented in it is rigorous and reliable.

Review #3
Anonymous submitted on 24/Oct/2018
Review Comment:

This paper presents an ABox abduction method based on computing minimal hitting sets. It computes syntactically minimal, consistent and relevant explanations containing atomic and negated atomic concept and role assertions in DLs up to SROIQ. It also allows the specification of a length parameter defining the maximum length of computed explanations.

The work extends this underlying algorithm to handle multiple observations. First, by considering abduction for multiple observations via two methods: "splitting", which treats each single observation as a separate subproblem before combining the overall explanation, and alternatively by "reduction", which combines the set of observations into a single statement using nominals.

The single observation case, and the two options for the multiple observation case, are then evaluated over three small ontologies.

The paper is mostly well written, and appears to be technically sound.

In terms of originality, this paper presents an interesting approach to ABox abduction, which is a problem that attracts significant interest in various practical applications. The characteristics of the hypotheses are not new or uncommon, but the presented algorithm does have the advantage of being able to perform abduction up to very expressive DLs (including SROIQ) as noted in the paper. The method itself is well described, as are the optimization strategies used.

Where the paper falls short is in the quality of the evaluation and the significance of the results. The experiments were performed over three very small ontologies, and the observations used were restricted to a single observation for each ontology. It is not clear why these ontologies were chosen, particularly since there are real world ontologies with ABoxes. Similarly for the observations, it is not clear how or why these were chosen: why sets of three observations for the second experiment, for example?

The paper states that the evaluation shows the approach to be "feasible". However, even for these restricted scenarios, the method ran out of memory often (e.g. 50% of the depth 3 for the multiple observation experiments). Thus, it is questionable how feasible the method is in practice. This also makes it difficult to appreciate the impact of the optimization methods, particularly in the absence of a comparison between the results with and without these optimizations.