Probabilistic Description Logics under the Distribution Semantics

Tracking #: 601-1809

Authors: 
Fabrizio Riguzzi
Elena Bellodi
Evelina Lamma
Riccardo Zese

Responsible editor: 
Guest Editors RR2013 Special Issue

Submission type: 
Full Paper
Abstract: 
Representing uncertain information is crucial for modeling real world domains. In this paper we present a technique for the integration of probabilistic information in Description Logics (DLs) that is based on the distribution semantics for probabilistic logic programs. In the resulting approach, that we called DISPONTE, the axioms of a probabilistic knowledge base (KB) can be annotated with a real number between 0 and 1. A probabilistic knowledge base then defines a probability distribution over regular KBs called worlds and the probability of a given query can be obtained from the joint distribution of the worlds and the query by marginalization. We present the algorithm BUNDLE for computing the probability of queries from DISPONTE knowledge bases. The algorithm exploits an underlying DL reasoner, such as Pellet, that is able to return explanations for queries. The explanations are encoded in a Binary Decision Diagram from which the probability of the query is computed. The experimentation of BUNDLE on probabilistic knowledge bases shows that it can handle knowledge bases of realistic size.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 03/Feb/2014
Suggestion:
Major Revision
Review Comment:

### Overview:

This paper deals with the DISPONTE framework for probabilistic description logics -- the framework itself was introduced by the authors in previous work -- that adopts the distribution semantics, where axioms are annotated with probability values and an independence assumption is made. The main contributions of the paper are the presentation of the BUNDLE algorithm for query answering over DISPONTE KBs and a brief experimental evaluation of said algorithm, though versions of these were already presented in the RR 2013 version of this article.

### Main Comments:

I have several concerns with the current version of this manuscript, which are mainly related with (1) how the authors portray the relationship between their work and the related formalisms they refer to, and (2) the amount of new material present in this extended version wrt the conference (RR 2013) paper.

(1) In Section 7 (Related Work), the authors present a discussion of an extensive body of related work. However, in my opinion they fail to adequately portray the relationship between their work and some of the references. For instance, on Page 18, the second to last paragraph ("DISPONTE differs from Heinsohn (1994) ...") states that DISPONTE "minimally extends the language and provides a unified framework for representing different types of probabilistic knowledge: from assertional to terminological knowledge". It's not at all clear why minimally extending the language is an important feature; regarding the other characteristic, the authors themselves state that there are other works that allow both assertional and terminological knowledge to be probabilistic. The comparison with these works is therefore quite weak.

Continuing in this vein, though perhaps more serious in nature, the comparison with Nilsson's probabilistic logic is flawed. The authors refer to the fact that in the Nilsson approach, there is no independence assumption being made and that therefore probabilistic conclusions must be weaker. Indeed, independence assumptions are not made in this logic -- however, ground instances can be added to the KB in order to express that probabilistic independence holds for these cases. Consider, for instance:

p(a): [0.4,0.4]

p(b): [0.25.0.25]

p(a) ^ p(b): [0.1,0.1]

Atoms p(a) and p(b) are therefore constrained to be independent.

Though this is perhaps not an elegant way to express things, it seems as appropriate as the authors' claim that their framework can handle situations in which independence does not hold. With respect to this last point, the authors state that their approach can do so "possibly introducing new atoms if needed" -- this statement deserves a more detailed explanation (perhaps an example), including an estimate of the growth of the KB due to these extra atoms. Though the reference to the fact that BNs can be encoded with PLPs under the distribution semantics is appropriate (since BNs are known to be able to encode any probability distribution), I don't see the relevance of the other references, since it is not clear if they are making independence assumptions or not.

Finally (for this point), the comment about the intervals vs. point probabilities is also inadequate as it stands, since interval probabilities generalize point probabilities, so the formalism is not forced to use the full generality of the approach.

As a final point in the matter of related work, the last paragraph of the section deals with the approaches by d'Amato et al. (2008) and Gottlob et al. (2011); the authors claim that DISPONTE provides a "tighter" integration of probability in ontologies as we do not rely on an additional graphical model". According to the authors' own account, these papers appear to generalize DISPONTE by allowing annotations to refer to a probabilistic model that is more general than the one they use (one Boolean random variable per axiom, under the independence assumption). The fact that DISPONTE's stricter assumptions allows axioms to directly be annotated with probabilities is really of no consequence to the tightness of the integration. In relation to this, the comment on tightness brought to mind some related work work on tightly integrated probabilistic ontologies by some of the same authors that should probably be included when talking about tightness of integration.

(2) Regarding the amount of new material presented in this extended version, it is not clear to me that it is significantly enough, since the main framework, inference mechanisms, BUNDLE algorithm, and experiments are aleady included in the conference paper (the experiments are different, however). In an eventual revision, please include a point-by-point description of what's new in the journal version to avoid confusion in this regard.

### Other comments:

The experiments section could be expanded to make it more comprehensive. For instance, a study on data complexity (where the ABox is varied and the TBox is fixed) seems to be a most important evaluation that is not present. In real world settings, it is often that case that the largest portions of ontological KBs are the ABoxes, so this would be more relevant than the experiments performed for running time.

In Section 4, why to the authors include proofs of theorems that they attribute to other authors?

### Minor issues and typos:

-- Please include QED marks (white boxes) to mark the end of proofs.

-- Page 17: "hundred of thousand" --> "hundreds of thousands"

-- Page 18: what does "upper ontology" mean?

-- Page 19: "considers a probabilistic" --> "consider a probabilistic"

-- Page 22: "In Table 2 we reported" --> "In Table 2 we report"

Review #2
By Claudia d'Amato submitted on 04/Feb/2014
Suggestion:
Minor Revision
Review Comment:

The paper introduces BOUNDLE, a reasoning algorithm, grounded on the exploitation of a DL reasoner, for computing the probability for a given query with respect to a DISPONTE knowledge base that is a Description Logic knowledge base under the distribution semantics where DL axioms are annotated with probability values thus defining a probability distribution over regular knowledge bases.

The problem the authors focus on and the argumentations given for it are presented in a very clear and effective way. The examples presented in section 3 are very clear and helpful for facilitating the paper understanding.

Overall, the paper focuses on an interesting problem and the proposed solution is sound, well presented and argued.

Section 4 is rather long. It should be organized with subsections. Additionally, it should clearly highlight the novel contributions of the authors that is not clearly addressed currently. If possible, the authors are also invited to report the comparison with PRONTO (as for they did for the conference paper), especially for the new ontologies considered in the current version of the paper. Additionally, since the authors focus on SHOIND(D), they are invited to report experiments on ontologies more expressive than those used in the current version of the paper.

There are also some aspects that could be improved/clarified. They are reported in the following.

- the definition of composite choice should be clarified. The role of m is not fully clear. Similarly, at beginning of section 4, the definition of a set of composite choices covering with respect to a query Q is not fully clear.
- at beginning of page 13, the authors should clearly state they are considering the case of concept C unsatisfiability as the example under discussion
- the rationale for the black box pruning algorithm should be given, as well as the reason for which the algorithm searches for an axiom that subtracted from S makes the concept C unsatisfiable
- on page 14 it is not clear how the new explanation, label of w, is computed. It seems than the only change results in K and not in S. In general, the hitting set algorithm requires additional details. It does not result fully clear.
- figure 9, line 10: CS is unknown. It should be MinAs.
- considering that the function bundle represents one of the major contribution of the paper, it should be described with more details. Maybe, also some examples could be added here.
- beginning of sect. 7, the case of type 2 statement adopted by DISPONTE is not clear.
- is there any reason why, passing from 14 to 16 as a number of explanations, the execution time increases sensibly?

MINOR:
- page 5, example 1: "the value if its the selector" --> "the value if its selector"
- in example 1, it could be preferable to list the four world immediately before the query and then specify in which world the query is true (false respectively).
- page 12, def. 1: "MinA" --> "MinAs"
- beginning of section 5.1: "(a,b) belongs." --> "(a,b) belongs to."
- page 19: "in the set of to all models" --> "in the set of all models"
- page. 20: remove "in order to compute the correct probability" at the end of the sentence, it is a repetition.

Review #3
Anonymous submitted on 04/Feb/2014
Suggestion:
Major Revision
Review Comment:

The paper presents is about a probabilistic extension of Description Logics and builds upon previous work. The contribution consist a query answering procedure and an evaluation on one probabilistic KB.

The paper is reasonably well written and self-contained and the results are apparently good compared to other prob. DL systems. I'm basically fine with the paper, but besides, few minor issues, I think there are still some points that need a more in-depth investigation, which do not require too much effort. Specifically,

1. I would like to have a better experimental set-up:

- For the considered ontologies, I would like to see also test cases for which the subsumption tests do not hold. At least to have figures telling about the execution time.

- (cf. Table 2) Well, e.g., your NCI ontology extract is a very small fraction of the original one. I would like to see for all ontologies you considered, a table indicating, e.g., within the 5min time limit how big your extract can become to get a response time within 5min...(possibly also increasing the number o of prob. inclusions above 1000). That is, a 3-dimensiona graph, with input axis x the size of ontology extract, input axis y the size of prob. inclusions, and result axis z the running time. this provides a better understanding of the effectiveness.

- It also not clear how the running time is affected for ontologies with a large number of individuals. e.g., FMA ontology

http://sig.biostr.washington.edu/projects/fma/release/v3.2.1/alt_formats...

What about the time to compute the probability of an assertion? This should be tested.

2. It is not clear to me how you may adapt your method to Conjunctive Query Answering. Please, discuss it this point

Minor points:

- roles in number restrictions have to be 'simple' roles, ie. are neither transitive nor have a transitive subrole

- eq 5.: upper case P(w)

- The BUNDLE algorithm accepts as query a concept C asking about explanations for the unseat test. For completeness provide the transformation from the entailment test of instance checking to concept satisfiability.

- footnote 4: link not working

- your BUNDLE repository was not accessible, in the sense I was not allowed to download the software and data. This should be fixed.

- Please, check again biblio: e.g., dl-lite in lower case...