Review Comment:
The paper presents a method for explaining missing subsumption relationships,
with a proposed application for semantic matching.
The problem that the paper is concerned about is following:
we are given an ontology (in the context of the
paper: a TBox) and some desired subsumption
relationship between concepts which
is not derivable through logical reasoning, and we want to compute a
set of concept inclusions that, when added to the TBox, would make
the subsumption relationship entailed.
The authors solve this problem via a new notion of abduction,
which is based on previous notions, but extended to deal with
more complex expressions in the solutions. The authors
motivate their work with semantic matching, and present also a
practical algorithm for computing their
abduction results in practice, which has been implemented and
evaluated on randomly generated data.
I think this is a relevant problem which might indeed find use cases
in matching, and I also like the new idea for abduction. However,
the paper is not good enough for a publication
at an international journal. There are issues in the presentation that
are below the standards of the journal. The contribution of the paper
is limited, and the setup of the experiments can be
improved. Finally, the whole story of the paper as supporting semantic
matching is not really convincing - from the view point of the paper,
semantic matching is essentially made equivalent to subsumption.
** The main contribution
The specific type of abduction the authors investigate is based on
the notion of "connection-minimal abduction" introduced in [15].
The idea can be roughly described as follows: we want to find out how
to establish a subsumption relationships between two concepts C1 and
C2. For this, we construct description trees for these concepts that
capture respectively a concept that subsumes C1, and one that is
subsumed by C2. Such a description tree shows the (tree-)structure of the
concepts, where nodes represent elements with concept names as labels,
and edges represent the role-relations. If C1 is not subsumed by C2,
the two trees must be necessarily different, but we might be able to
find a partial match that links the nodes of one tree to the other.
Based on this match, we can determine what is missing to make one tree
subsume the other, and construct the axioms that need to be added to
the ontology in order to make the desired subsumption hold. In [15],
the description trees have to be essentially identical except for the
node labeling, which leads to solutions which always relate
intersections of concept names, but never involve role
restrictions. The present paper extends the approach by also allowing for
situations where role-successors would need to be added, which means
that also solutions are produced in which role restrictions are used,
essentially allowing the full expressivity of EL in the solutions.
This makes this approach strictly more powerful than the original
approach in [15]. The authors do not only introduce this new
type of abduction, they also describe how to compute abductive
solutions in this setting, and evaluate an implementation.
** Semantic Matching
The title indicates that this is also a paper about semantic matching. Yet
the paper is quite late at explaining what the authors
actually understand as semantic matching. The introduction
provides pointers to the literature to semantic matching (Page 2,
Line 18), but without explaining what the problem is, and why it is
relevant. In fact this list of literature references comes quite
unexpected and is a bit confusing at this point. On Page 6, one then
finally gets a formal definition, but it is a bit strange:
"semantic matching determines whether two concepts are semantically
similar (or compatible) w.r.t. some background knowledge, if their
intersection is satisfiable."
Indeed this is how the first type of matches are defined
(Definition 5 on the same page): two concepts match if their
intersection is satisfiable. This is not particularly strong,
since then almost everything in an ontology will be
"similar" in that sense. Luckily, Definition 6 then provides different
types of semantic matches (those of "higher degree"), but these types
just correspond to the different subsumption relationships that can
hold between two concepts.
(A side-note on Definition 6: since these 3
types of (positive and negative) have different names in your paper
("exact", "plugin", "subsume"), it would be good to put these names
next to the corresponding cases, so that one directly understands what
they mean.)
It seems to me that the crux of semantic matching is that one wants
to find candidates for semantic matches: given a concept C1,
give me concepts that match C1 according to some of the notions. The
setting in the paper is quite different: here we are given C1 and C2
that do not match, and we want to understand how to make them match.
But since the only relevant type of matching just corresponds to
classical subsumption, I do not see a reason for talking about
semantic matching at all.
** Experimental Setup
Experiments are called "simulations" by the authors. As
experimental data, the authors use empty TBoxes and
randomly created concept expressions. There are two issues with
this setup:
1) it does not give any idea on how the method would perform on real
data,
2) it limits reproducability, since the authors remain vague in the
paper on how they exactly create the concepts,
3) since the authors create the data themselves, some readers might
worry that the data has been tailored to make the experiment
succeed.
There is a range of realistic ontologies and ontology-benchmarks that
are freely available on the web, already packaged ready to use on
Zenodo: BioPortal, MOWLCorp or the ORE competition benchmarks, to give
some popular examples. Moreover, I had a look and the paper in [15],
which forms the basis of the present method, actually provides their
evaluation data also on Zenodo - it would therefore have been easy to
just use it as well, to at least get an idea on how the two method
compare. Interesting questions would for instance have been: how often
does the present approach find a solution where the original approach
in [15] doesn't? Do solutions look more reasonable, or are they more
concise? How do the approaches compare in terms of efficiency?
Finally, the benchmark used in [15] is based on realistic ontologies,
and not on randomly generated concepts.
As a result, the evaluation does not really give that useful insights.
** Presentation
The presentation is sometimes sloppy. To give some examples:
- symbolic AI is sometimes written capitalized (e.g. Page 1, Line 46) and sometimes in
lower case (e.g. Page 1, Line 41),
- "TBox" is sometimes written "Tbox" (e.g. Page 1, Line 19)
- The graphs in Fig. 1+2 are pixel graphics with low resolution and
look fuzzy, when they should clearly be vector graphics (in latex
one can for instance include pictures in eps format)
- The caption of Algorithm 1 (page 20), has a weird layout and is hard
to read - apparently the authors put text into a math environment
There are definitions that often incomplete or do not work well: when
trees are introduced (Page 5, Lines 24-28), they are just introduced
as graphs - the basic propery of being a tree (e.g. no cycles, one
root, no node with two incoming edges) are not mentioned. On Page 6,
Definitions 2 and 3 then define edge-labeled and node-labeled trees
separately (as 4-tuples, where labeling functions that have a disjoint
domain), when later, trees that are both edge and node labeled are used,
now as 5-tuples that have two labeling functions (Page 7, Definition 8).
This is not how it works: you should have one definition where you
directly define labeled trees with the complete labeling function,
exactly in the way you use it later.
In general, the authors do not know well how definitions are written.
Take for example Definition 3 (Page 6):
"A rooted tree T = is called node labeled if
there exists a labeling lambda_V:V\mapsto L_V"
- "there exists" is not needed here - lambda_V is part of the tree!
- "\mapsto" should be "\rightarrow"
"[..] which maps all nodes to labels in a set of node labels L_V, such
that for any v_i\in V and l_v\in L_V, if v_i is mapped to l_V, then
lambda_V(v_i)=l_V"
- what follows after "such that" is just the usual mathematical
notions for functions - there is no need to mention that in a
paper, especially not in a definition.
The description of the algorithms (there are seven algorithm
environments in total in the paper, the appendix just consists of six
of them) is too low-level and detailed - some of it is pretty
straight-forward or standard and does not need to be described at this
level of detail, others could be more clearly and concisely described
using mathematical notations (see for example Algorithm 2 an 3).
To conclude, even if the paper was better in terms of motivation,
contribution and evaluation, the presentation would make it hard to
access the quality of the results, and is in anyway not up to the
standards of an international journal paper.
* Long-term Stable Link to Resources
The authors provide a link to a github repository, which contains a
zip-file with the code needed to run the implementation and
reproduce the experiments. Using a zip-file is not really optimal
(better and more convenient would be to put the source code directly).
It is good to see that maven is used for dependency management, which
facilitates compiling the projects on different machines.
The readme reads a bit strange:
"To run the experiments open the workbench folder in a new Maven Project.
Next, open the Experiments.java class. Build the project. Provided that
all the necessary dependencies are installed, the algorithm will ask for [..]"
Usually one would assume which dependencies need to be installed on a machine
and how to compile and run everything from command line. The present description
seems to assume some IDE to be used, which is unusual and less convenient for
reproducing experiments. Class name would be "Experiments" not "Experiments.java",
and necessary dependencies should be automatically installed by maven (that is
the whole point of using maven). I did not try to compile and run the code myself,
as it was already clear at this point to me that the paper cannot be excepted.
|