Review Comment:
The paper describes a tool and methodology for combining linked data
vocabularies into a single logical model. The abstract makes the claims
that this combining is needed when different linked data vocabularies share
terms, that this sharing is commonplace, and that combining vocabularies
requires combining their authoritative logical models. The overall claim is
that linked data should have a firm logical foundation.
The Basic Assumptions of the Paper
The linked data movement is vitally about sharing data. Part of this
sharing involves the reuse of vocabulary from existing data when new data is
created. Whether this sharing and reuse also involves sharing of
machine-processable (partial) definitions of vocabulary is a separate
matter, and one that the linked data movement may not completely buy into.
If one does want to automatically reason about linked data combinations then
the development of a set of (partial) definitions, i.e., an ontology, for
the combined vocabulary is going to be needed. As the paper is about
reasoning then this need can be considered as adequately supported. This
does not, however, require that the ontology associated with the
different sources of the combined data be used. But one could argue that
the linked data movement should also reuse these ontologies.
So, so far so good. These assumptions of the paper are supportable,
although the paper would be better if it were more forthright about them.
The Basic Problem Examined in the Paper
Combining ontologies that have been developed independently is
known to be problematic. There are lots of problems that can crop up,
including those related to expressiveness but also those related to
modelling errors that are benign in a limited setting and also those related
to different ways of modelling. Examining how these problems play out in
actual linked data is a useful exercise, particularly if tools that can help
find problems are developed.
A lot of care does have to be taken when doing this examination, as the
results can be very different depending on just what inputs are considered
or which formalisms are considered.
Here we see a small problem in the paper. When the paper is referring to
OWL it should be more careful as to what it is referring to. Although most
implementations of OWL implement something (e.g., OWL 2 DL) that does not
handle all of RDF(S), it is possible to combine aspects of OWL with all of
RDFS to produce something reasonable, even though OWL Full is not quite
fully upward compatible from RDFS.
As well, an effort to create a true combined ontology might be
different from the usual analysis done in the Semantic Web. For example,
ontology hijacking might be necessary if an expressively impoverished
existing ontology (perhaps expressed in RDFS) is to be used in a
setting where more powerful definitions are needed. So practices, like
ontology hijacking, that have been considered bad might in fact be good in
this setting.
All that said, the effort to examine how current ontologies related to
linked data can be combined into consistent logical models (or not) is a
worthy one, provided that it is done decently, and a tool that can find
problems in such combinations could be very useful indeed.
Challenges
The paper defines a linked data schema in a particular way, limiting its
scope to RDF, RDFS, and OWL, which is reasonable, and requiring certain
characteristics of a linked data schema, which are also reasonable.
I have to add the qualifier that formal definitions of suitable ontologies
are often trivial to achieve. In this case in a particular ontology
that simply makes vacuous statements about each vocabulary element, e.g.,
that a class is a subclass of the universal class, will be suitable. The
paper should thus mention that it depends on existing schema having some
unspecified notion of utility for its ontology to be useful.
The paper mentions several problems that can (and do) occur when combining
different ontologies, including different ontology languages,
Here the paper should be very careful about how it refers to the various
versions of OWL, in particular using OWL 2 DL when appropriate.
The paper talks about ontology hijacking early on but in Section 2.3
presents a somewhat more qualified description of the problem. However,
even this description is somewhat too strong as it implies that any ontology
provide universally applicable definitions of its vocabulary. This seems to
prevent other ontologies from making any additions to the definition of
external vocabularies. It is not that such additions are never problematic,
it is just that such additions can be needed sometimes but must be done with
extreme caution. A simple change in the section make it minimally
satisfiable: "ontology hijacking can be a problem".
Related Work
The section on related work is very good.
Dacura
The description of Dacura is quite reasonable. Certain compromises have
been made in what Dacura can do, but compromises in the reach of tools are
often necessary. The goals of Dacura - detecting (some) flaws in combined
ontologies - are described well. The limitation on ontology cycles is a bit
puzzling, but as stated in the paper such cycles in existing ontologies are
almost always unintended and can be considered to be flaws.
The limitation on owl:disjointWith is somewhat concerning even though
problems with disjointness generally only show up when considering instance
data. The comment that disjointness is usually a problem with instance data
and thus not of interest is poorly worded. It should be more obvious that
the work here does not take into account instance data and that this is why
not handling disjointness is not a severe problem.
Evaluation
The determination of commonly used linked data vocabularies is good,
particularly in its use of multiple sources. The analysis is very good,
showing existing use and giving reasons for what shows up. The analysis of
particular combined ontologies is also
The flaws in the analysis (inability to handle OpenCyc) are described
decently and are understandable.
The lessons learned from the validation are generally reasonable.
Summary
The work described in the paper is new and useful. The paper is easy to
read. There are a few minor glitches, described below, but the paper is
generally acceptable.
Accept with trivial changes
- required - use OWL 2 DL, etc., as appropriate
- "OWL1" -> "OWL 1", "OWL2" -> "OWL 2"
- rewrite sentence on why disjointness is not a serious concern
- desirable - a sentence or two on assumptions
- a sentence about trivial input ontologies
- slight qualification about ontology hijacking
|