Searching for explanations of black-box classifiers in the space of queries

Tracking #: 3049-4263

Authors: 
Jason Liartis
Edmund Dervakos
Orfeas Menis-Mastromichalakis
Alexandros Chortaras
Giorgos Stamou

Responsible editor: 
Guest Editors Ontologies in XAI

Submission type: 
Full Paper
Abstract: 
Deep learning models have achieved impressive performance in various tasks, but they are usually opaque with regards to their inner complex operation, obfuscating the reasons for which they make decisions. This opacity raises ethical and legal concerns regarding the real-life use of such models, especially in critical domains such as in medicine, and has led to the emergence of the eXplainable Artificial Intelligence (XAI) field of research, which aims to make the operation of opaque AI systems more comprehensible to humans. The problem of explaining a black-box classifier is often approached by feeding it data and observing its behaviour. In this work, we feed the classifier with data that are part of a knowledge graph, and describe the behaviour with rules that are expressed in the terminology of the knowledge graph, that is understandable by humans. We first theoretically investigate the problem to provide guarantees for the extracted rules, in particular their consistency and understandability. Then, we investigate the relation of "explanation rules for a specific class" with "semantic queries collecting from the knowledge graph the instances classified by the black-box classifier to this specific class", thus approaching the problem of extracting explanation rules as a semantic query reverse engineering problem. We develop algorithms for solving this inverse problem as a heuristic search in the space of semantic queries and we evaluate the proposed algorithms on three simulated use-cases and discuss the results.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Rafael Penaloza submitted on 17/Mar/2022
Suggestion:
Major Revision
Review Comment:

This submission proposes a new approach for explaining the results of a black-box classifier,
by providing a user with a query, or a rule, which describes the basic characteristics of the
resulting class. The approach is based on known techniques from the DL community: computing the
most specific query (that is, the query that most tightly describes a given individual from a
dataset) and the least common subsumer (the tightest query that generalises other two). In addition
to the technical details of the approach, the authors provide an empirical analysis bases on three
experiments, which provides evidence for the usefulness of the approach to explain the results of
the classifier.

In general, I believe that the submission covers a very important and relevant topic in the modern
era of AI, where most current methods are based on deep learning methods, but explainability of
answers is not only desirable, but also begins to be required by law in some regions. The kinds of
explanations provided by the queries generated through this approach require some technical
understanding, but are relatively easy to grasp by a trained user, as long as the number of
conjuncts is not exceedingly large. Under this view, I believe that this work is worth exploring.

Still, I believe that there are some important points that need to be improved or clarified before
it can be accepted for publication.

My main concern is with the motivation of the paper, and how it applies to the developed formalism.
The authors argue that semantic labelings should be used instead of instance features to allow for
understandable explanations, given that the names have a meaning to users reading the explanation.
In the original example, the authors propose that the input of a diagnose is a recording of a cough,
which for training has been semantically labeled by an expert (as "dry cough", "loose cough", etc).
These labels characterise the recordings, and are used for the explanation. I see two obvious
problems with this (which are actually connected):
1. the interest of explaining black-box classifiers is not in understanding the train or test
instances, but rather in understanding a new instance given. For those, the semantic annotations
are not available---otherwise, we would not need the black-box classifier to begin with
2. there is no guarantee of any kind that the black-box classifier is using the "semantics" given
by the annotations. Indeed, it could be that for the classifier other factors such as the
length of the recording, the frequency of cough, or its strength are more important. We would then
be explaining through irrelevant labels (and hence, mislead the users).
The second point is particularly important in the context of explaining black-box classifiers, since
that is precisely the issue with these kinds of classifiers: the reasons why they select a class
over another is not accessible. This well-known issue was already immortalised in science fiction
from the late 90s.

Another issue is with the classifier, and the rules extracted. As the authors observe, it is
difficult to find correct rules for a classification, and hence one would like to find some with
exceptions. However, it is usually these exception that are interesting to explain (the difficult
cases to understand, not the simple instances). For these, there is no notion of explanation, except
about being exceptions to a rule. Very important details, such as bias, could be hidden in those
exceptions that cannot be understood. It would be good if the authors could discuss this issue.

Another less problematic, but still relevant issue regards presentation. I have recently observed
this trend where at first sight, papers seem more general than they actually are, and important
restrictions are hidden in some sentence in the middle of the paper. Although this should formally
not be an issue (a paper should be read in detail), in reality can cause confusions to superficial
readings. In this case, for example, the authors first define the general notion of a conjunctive
query, then (in line 47 or page 3) say that "query" stands for "conjunctive query", and a few lines
later (line 2 of page 4) say that "query" stands for "instance query". Someone who misses that
last passage can easily think that it can handle conjunctive queries in general. Another case is
at the end of page 4: after defining rules in general, it is assumed that all rules are "connected".
This assumption is mentioned there, and nowhere else, but used throughout the paper.

There are a few minor comments in addition to these, but at this point I believe that the larger
comments above justify a major revision of the submission.

Review #2
Anonymous submitted on 02/May/2022
Suggestion:
Major Revision
Review Comment:

It is a very, very interesting paper. Yet, I have the impression that it promises a bit more than what it delivers. Predominantly from two aspects:
- No external knowledge graphs are used; a knowledge base is built based on the exemplary datasets, using 'features' extracted from it; it looks (at least based on the examples) that a knowledge bases are built and are fully related to datasets, right?
- The examples are a bit disappointing; they should provide some 'exact' results so a reader can derive her own opinion regarding the method's benefits (a bit more below).
There is also the impression of over-complication in some places. And, the paper is long and not the easiest to follow. Understandably, you need to provide a number of concepts and notations – but it is a bit overwhelming – the idea behind your approach is somehow difficult to 'extract' (sorry, it was my impression). So, any simplification of the paper would provide the reader with an easier task to understand it and appreciate it.
Still, I like the paper and many aspects of it very much and would like to see its simplified and more 'goal-oriented' version.

Some comments:
- you use the term ‘semantic query’ – any particular reason for that?
- what do you mean by “certain answers” – are these definite ones? Do you even need to mention that? Could some answer be uncertain?
- You use the term Exemplar as a concept and exemplar as datapoint (interpretation); if I could suggest something (because the only difference is a different font and a capital first letter) to change exemplar as datapoint into ‘exemplary datapoint’ – I think it would improve comprehension/readability of the paper
- Page 4, line 14: le should be ExE ->2 …
- Page 4, line 24: you use the term ‘substitution’ – a few words of explanation, please
- Page 4, line 49: you have ‘D(x)’ – what D stands for? I assume it is different from C(x); does it mean class/category of a classification problem (sorry, I could not find any explanation in the text)
- Page 6, Fig 1: it is a nice figure but could include a bit more details that explain the approach, or you could include a figure representing a ‘flow’ of the process (just a suggestion)
- Page 7, line 9: is Exemplar a subset of EN? (the first part of the formula); in general, it would be nice to see a bit more explanation of this important formula
- Page 7, Example 1 (and following up): I really like the idea of ‘running example’ that illustrates your proposed approach, yet, not everything is very clear – for example, s1, …, s8 as a part of IN – are these symptoms? Also, it would be nice to mention how the explanation rules (top of page 8) were created (refer to Section 4, its particular parts)
- Page 8, lines 18-20: it seems to me that it is the essence of your approach – to make the paper more understandable for a reader, it would be nice to define what is a query reverse engineering problem in this particular context
- Page 10, line 29 (and page 11, line 48): TBox eliminated? Could you please provide a short explanation?
- Page 12, line 46 (Section 4.2) is it easier to determine ‘dissimilarity’ than ‘similarity’? any comment here?

Section 5: Experiments
As much as I enjoy three different datasets used as case studies, I am disappointed you have not provided examples of generated rules/explanations. It would be very interesting to see how we (readers) can ‘evaluate’ these explanations. Among the three datasets, the first two seem (at least for me) quite suitable for showing how classifiers work. For example, I could suggest putting some samples of classified (correctly/or many not) data points and explaining why this happened. The third dataset – MNIST – is very popular, as you indicated, but somehow less attractive for the topic of your paper.

Review #3
Anonymous submitted on 13/May/2022
Suggestion:
Reject
Review Comment:

###Summary of the paper###

The problem of explaining the behaviour of machine learning models has recently drawn considerable attention in AI research. This interest stems from the fact that very efficient classes of these models, such as deep learning classifiers, do not provide explanations of how/why they take decisions leading to the classification of data samples. For this reason, a great variety of approaches have been proposed to provide understandable and meaningful explanations for the behaviour of such classification models.

This paper proposes to explain the behaviour of a black-box classifier by using semantic annotations on the sample data, whose meaning is supported by an underlying ontology formulated using a Description Logic (DL). More precisely, the authors propose the following framework to explain the behaviour of a classifier:

- Given are: A classifier F, a set of data samples annotated with semantic descriptions from a vocabulary V and a DL ontology defined over V. For each data sample of interest, the assertional component of this ontology (the ABox) identifies it with a constant, states its classification according to F and contains formulas expressing the corresponding semantic annotations.

- Goal: To produce a set of rules expressed over V that explain the behaviour of F in an understandable and meaningful way.

Given a class C, an explanation rule R_C for C produced by the framework have the form:

R_C = Body(x,x_1,...,x_n) ----> C(x),

where Body is a conjunction of unary and binary predicates from V with parameters in {x,x_1,...,x_n}.

The idea is that R_C expresses sufficient conditions for an item to be classified in the class C. For this rule to be correct, it must be a logical consequence of the set of axioms in the underlying ontology.

Besides the proposed framework, the other main contributions of the paper are:

- A suite of algorithms to compute (approximate) explanation rules. These algorithms are based on the fact that finding a correct rule R_C can be reduced to finding a conjunctive query Q_C whose certain answers w.r.t. the ontology are all positive instances of C.

- Experiments are conducted to evaluate the quality of the queries produced by the algorithms, in terms of how accurate they represent the behaviour of the classifier.

###General Evaluation###

The topic considered in the paper is interesting and of relevance for this journal. However, there are several aspects that (in my opinion) make this paper far from ready for publication. Basically, it is written in a very informal way and there is a lack of precision in many definitions. In addition, one of its most important claims does not seem to be justified at all, or its justification is deferred to either a Workshop paper or an additional paper that does not appear to have been peer-reviewed. I have also found several inconsistencies with the experiment and their results.

I describe all my concerns below. I believe that to make this paper ready for acceptance would require more than a Major Revision. Therefore, I propose the paper to be rejected.

###Main Problems###

a) The goal of the paper or the problem it aims to solve is:

...given a set of exemplars and their semantic descriptions find *intuitive* and *understandable* semantic queries that have as certain answers exactly this set out of all the exemplars on the explanation data sets... (see for instance: page 5, line 47).

I am not convinced that this is well justified in the paper:

- Given that the underlying reverse engineering problem have already been investigated in the literature, the properties of being *intuitive* and *understandable* seem to be a distinctive and relevant aspect of the paper's contribution. However, I could not find a formal or convincing argument that supports this in the paper. All this seems to be discussed in the paper at a very informal level. In particular [53] and [46] are cited in these discussions. On the one hand, this submission appears to be the extended version of [53] which is a Workshop paper. For this reason, I believe that any discussion in this regard appearing in [53] should actually be provided here. On the other hand, it is not clear what kind of manuscript [46] is. Has it been peer-reviewed? The bibliography entry gives no clue about it and what I could find in DBLP looks like a shorter version of this submission.

- The experiments are not designed to analyse this aspect of the provided solution, and there is no formalisation of what *intuitive* and *understandable* means. For example, one possibility could be to define reasonable properties, either of a syntactic or a semantic nature, that *intuitive* and *understandable* queries are required to satisfy. Once this is done, it provides a formal criteria that can be used to evaluate how good the queries produced by the provided algorithms are.

b) Experiment 1 yields 100% fidelity for the two datasets of greatest size. It seems unlikely that this can be the case, unless the Mushroom dataset is not really a good choice for benchmarking:

- Note that only 123 mutually different items are required for Merging to yield the "empty query" (or a query with only TOP(x)) in your algorithms. This means that there would be no query capturing the exact behaviour of the classifier. Notice that 123 is really small w.r.t. 1000 and 4000, which are the sizes of the biggest datasets.

- The average length of a query is around 14, which is only a bit above the half of the number of categorical features considered in the experiment. Further, an MSQ representing an item is a conjunction of concept names and it can never have two concept names representing different values of the same categorical feature. Hence, it looks like having 100% fidelity implies that half the categorical features in the Mushroom dataset are irrelevant (already for the classifier) to separate positive and negative instances.

c) The knowledge base used in Experiment #2 is extremely simple. I do not really see how this illustrate one of the main claims of the paper. Namely, that really complex ontological knowledge is take into account to provide explanations. In this experiment, this knowledge can be easily integrated into the data. No real ontology-based logical entailment is needed.

###Additional Comments###

1. The presentation of technical notions defining the syntax and semantics of Description Logics is very poor. This should be formally introduced, since the use of background logical knowledge is supposed to make the results in this paper distinctive from other existing approaches. Some aspects in this regard are:

- There is no example illustrating the syntax and semantics of a DL. In particular, nothing is said about how the interpretation of concept and role names extend to complex formulas. There is no formal definition of what a TBox and an ABox are, nor a formal definition for the satisfaction of axioms in a TBox/ABox.

- The second sentence in line 43 (page 3) is not accurate. The fact that most DL dialects are fragments of FOL has also to do with the the available set of constructors and their semantics.

2. Considered queries and some related notions.

- The notion of substitution should be properly defined.

- While it follows from the definition of QLCS that it is unique (modulo syntactical equivalent), this is the case if it actually exists. Whether it always exists or not is, however, not really shown in the paper. Note that, assuming that all queries contain TOP(x) as an atom only ensures that a common subsumer exists, but this need not be the minimal. A formal proof of this should be provided.

- The set of nodes V of an ABox graph should not be IN, but the individual names occurring in the ABox. The set IN is the set of all available constants. This does not mean that all of them must be used in every ABox. This should also be made clear in other parts of the paper.

3. In page 7, is the expression $Examplar \sqsubseteq {a | a \in EN}$ correctly expressed in some DL syntax?

4. Theorem 1 seems to be correct. However, the proof is missing and deferred to [46] (see previous comments about [46]).

5. Nothing is said about the cases where it is not possible to exactly capture the behaviour of the classifier. Can you provide examples and/or theoretical characterisations for this? This can help to understand the limitations of the framework, beyond the insights that could have been provided in the experiments. Perhaps, the existing work on the reverse query engineering problem can shed some light into this?

6. More details should be provided about why the algorithm computes a set of queries instead of just one. More precisely, what is the intention of giving a set of queries to the user? How are they meant to be used? In particular, the intermediate queries obtained in the computation.

7. The GreedyCommonConjunct (Algorithm 4) and the threshold approximation are introduced as alternatives of using just QLCS, to avoid certain problems. However, there is no comparison between all these variants in the paper. Natural questions that arise from it are:

- Is the "exact query" property lost by using GreedyCommonConjunct instead of QLCS?

- What would be the difference between queries produced by using GreedyCommonConjunct and those generated by using QLCS, in terms of the quality measures defined in the paper?

- How much using the threshold approximation affects the quality of the queries?

###Typos###
- page 2, line 48: what does KG stand for? Knowledge graph?
- page 3, line 20: [[46]]
- page 3, line 47: has a missing comma. This also occurs in other places.
- page 4, line 17: ...and a model I of K...
- page 4, line 25: I think it should be $\equiv_S$.
- page 6, line 37: ... in terms of D, C and V...
- page 6, line 41: ... are in the knowledge base...
- page 7, line 11: ...of the DL knowledge base....
- page 9, line 22: ...knowledge base...
- page 9: there is a big space at the end of the page.
- page 10, l. 12: ...From section Section...
- p.11, l.2: ...a classifier F to be explained...
- p.11, l.35: ... to be in S...
- p.14, l.30: It should be ((u1,v1),(u2,v2)) because V=V1 x V2.
- p.18, l.8: Should not theta be initialized as {x -> x}?
- p.20, l.4: ...explanations using Alg. 1
- p.21, l.45: TBox.