Linked data schemata: fixing unsound foundations

Tracking #: 1357-2569

Authors: 
Kevin Feeney
Rob Brennan
Gavin Mendel-Gleason

Responsible editor: 
Guest Editors Quality Management of Semantic Web Assets

Submission type: 
Full Paper
Abstract: 
This paper describes our tools and method for an evaluation of the practical and logical implications of combining common linked data vocabularies into a single local logical model for the purpose of reasoning or performing quality evaluations. These vocabularies need to be unified to form a combined model because they reference or reuse terms from other linked data vocabularies and thus the definitions of those terms must be imported. We found that strong interdependencies between vocabularies are common and that a significant number of logical and practical problems make this model unification inconsistent. In addition to identifying problems, this paper suggests a set of recommendations for linked data ontology design best practice. Finally we make some suggestions for improving OWL’s support for distributed authoring and ontology reuse.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By John McCrae submitted on 24/Apr/2016
Suggestion:
Accept
Review Comment:

This paper describes the application of the Dacura Quality System to detecting inconsistencies in ontologies, especially issues caused by referencing and importing external ontologies. Many of these issues are due to the incompatibilities of RDFS and OWL-DL and the presence of 'ontology hijacking' that is the redefinition of external concepts. The paper motivates itself well and is well-written and it would be great if the errors found during this study could be fixed. As such, I feel that this paper is a highly valuable contribution that should be read by all aspiring ontology creators. On the other hand, I remain unconvinced about the solutions proposed by the authors in section 7. The fixes proposed seem weak: Firstly (7.1) they propose that authors stop using rdf:List in ontologies and use custom definitions, however as noted many of the vocabularies studied are in RDFS where rdf:List is a fundamental element, thus the issue here is the fundamental incompatibility of RDFS and OWL. The authors claim that the issue is OWL using this as an internal element (perhaps the real issue here is OWL hijacking RDFS's mechanisms ;), however a more fundamental issue is that rdf:first is frequently used with both literal and URI values causing the property to be punned to both an object and datatype property. The authors then reject equivalence statements proposing that authors instead use subclass statements, while this would solve many of the issues the authors are talking about, it would also lose a key idea of the SW that we can align our schemas directly to each other.

The paper has some formatting issues:
Page 12 is nearly blank.
Pages 9 & 18 have tables in the middle of the page and the text continues in the right column not below the table... I read it wrong both times
When referring to ontologies either use their canonical name "Open Annotation" or an abbreviation in capitals "OA" or use italics "icalspec".

Review #2
By Peter F. Patel-Schneider submitted on 27/May/2016
Suggestion:
Accept
Review Comment:

The paper describes a tool and methodology for combining linked data
vocabularies into a single logical model. The abstract makes the claims
that this combining is needed when different linked data vocabularies share
terms, that this sharing is commonplace, and that combining vocabularies
requires combining their authoritative logical models. The overall claim is
that linked data should have a firm logical foundation.

The Basic Assumptions of the Paper

The linked data movement is vitally about sharing data. Part of this
sharing involves the reuse of vocabulary from existing data when new data is
created. Whether this sharing and reuse also involves sharing of
machine-processable (partial) definitions of vocabulary is a separate
matter, and one that the linked data movement may not completely buy into.

If one does want to automatically reason about linked data combinations then
the development of a set of (partial) definitions, i.e., an ontology, for
the combined vocabulary is going to be needed. As the paper is about
reasoning then this need can be considered as adequately supported. This
does not, however, require that the ontology associated with the
different sources of the combined data be used. But one could argue that
the linked data movement should also reuse these ontologies.

So, so far so good. These assumptions of the paper are supportable,
although the paper would be better if it were more forthright about them.

The Basic Problem Examined in the Paper

Combining ontologies that have been developed independently is
known to be problematic. There are lots of problems that can crop up,
including those related to expressiveness but also those related to
modelling errors that are benign in a limited setting and also those related
to different ways of modelling. Examining how these problems play out in
actual linked data is a useful exercise, particularly if tools that can help
find problems are developed.

A lot of care does have to be taken when doing this examination, as the
results can be very different depending on just what inputs are considered
or which formalisms are considered.

Here we see a small problem in the paper. When the paper is referring to
OWL it should be more careful as to what it is referring to. Although most
implementations of OWL implement something (e.g., OWL 2 DL) that does not
handle all of RDF(S), it is possible to combine aspects of OWL with all of
RDFS to produce something reasonable, even though OWL Full is not quite
fully upward compatible from RDFS.

As well, an effort to create a true combined ontology might be
different from the usual analysis done in the Semantic Web. For example,
ontology hijacking might be necessary if an expressively impoverished
existing ontology (perhaps expressed in RDFS) is to be used in a
setting where more powerful definitions are needed. So practices, like
ontology hijacking, that have been considered bad might in fact be good in
this setting.

All that said, the effort to examine how current ontologies related to
linked data can be combined into consistent logical models (or not) is a
worthy one, provided that it is done decently, and a tool that can find
problems in such combinations could be very useful indeed.

Challenges

The paper defines a linked data schema in a particular way, limiting its
scope to RDF, RDFS, and OWL, which is reasonable, and requiring certain
characteristics of a linked data schema, which are also reasonable.

I have to add the qualifier that formal definitions of suitable ontologies
are often trivial to achieve. In this case in a particular ontology
that simply makes vacuous statements about each vocabulary element, e.g.,
that a class is a subclass of the universal class, will be suitable. The
paper should thus mention that it depends on existing schema having some
unspecified notion of utility for its ontology to be useful.

The paper mentions several problems that can (and do) occur when combining
different ontologies, including different ontology languages,

Here the paper should be very careful about how it refers to the various
versions of OWL, in particular using OWL 2 DL when appropriate.

The paper talks about ontology hijacking early on but in Section 2.3
presents a somewhat more qualified description of the problem. However,
even this description is somewhat too strong as it implies that any ontology
provide universally applicable definitions of its vocabulary. This seems to
prevent other ontologies from making any additions to the definition of
external vocabularies. It is not that such additions are never problematic,
it is just that such additions can be needed sometimes but must be done with
extreme caution. A simple change in the section make it minimally
satisfiable: "ontology hijacking can be a problem".

Related Work

The section on related work is very good.

Dacura

The description of Dacura is quite reasonable. Certain compromises have
been made in what Dacura can do, but compromises in the reach of tools are
often necessary. The goals of Dacura - detecting (some) flaws in combined
ontologies - are described well. The limitation on ontology cycles is a bit
puzzling, but as stated in the paper such cycles in existing ontologies are
almost always unintended and can be considered to be flaws.

The limitation on owl:disjointWith is somewhat concerning even though
problems with disjointness generally only show up when considering instance
data. The comment that disjointness is usually a problem with instance data
and thus not of interest is poorly worded. It should be more obvious that
the work here does not take into account instance data and that this is why
not handling disjointness is not a severe problem.

Evaluation

The determination of commonly used linked data vocabularies is good,
particularly in its use of multiple sources. The analysis is very good,
showing existing use and giving reasons for what shows up. The analysis of
particular combined ontologies is also

The flaws in the analysis (inability to handle OpenCyc) are described
decently and are understandable.

The lessons learned from the validation are generally reasonable.

Summary

The work described in the paper is new and useful. The paper is easy to
read. There are a few minor glitches, described below, but the paper is
generally acceptable.

Accept with trivial changes
- required - use OWL 2 DL, etc., as appropriate
- "OWL1" -> "OWL 1", "OWL2" -> "OWL 2"
- rewrite sentence on why disjointness is not a serious concern
- desirable - a sentence or two on assumptions
- a sentence about trivial input ontologies
- slight qualification about ontology hijacking

Review #3
Anonymous submitted on 24/Jun/2016
Suggestion:
Major Revision
Review Comment:

The authors improved the way how the evaluated ontologies are selected - they do not just took Annotation Ontology and all ontologies dependant on that ontology, but they describe approach how they selected 50 most used ontologies and the ontologies on which they depend. So this is an improvement.

By reading it once again, I have to say that the description of the methodology is a bit vague - what is missing is the more formal definition of the Challenges. Also section "recommendations/best practises" needs to be better justified by the findings in the evaluation phase and/or literature. My impression is that the paper is trying to bring too much - it would be enough to focus only on the Challenges, Methodology and Evaluation (thus skipping the recommendations/best practise).

Also the newly added/adjusted parts have lots of typos, e.g.:
5.1. "most to most", 5.1. "above, -" . Furthermore, the authors do not properly describe Tables - e.g., Table 4 includes columns with "%", there is a typo in Table 3 - labelling of columns, ... Table 12 is not described (at least the main findings) should be described - showing just table with 30columns and 30rows full of numbers is not interesting anyhow.

Screenshot 1 has really low quality (nothing can be seen at all)

Also the authors ignored my suggestions to improve the readability of the paper by using different font for classes/predicates (actually in a fraction of cases, such different font is used)

Summary:
Before accepting the paper, it is necessary to rework the paper:
- describe the challenges/methodology in a more formal way.
- think about skipping the section "recommendations/best practises" OR provide better justifications for the findings by the results of the evalution/in the literature
- improve the readability - fonts, typos, ...

Review #4
By Mathieu d’Aquin submitted on 24/Jun/2016
Suggestion:
Major Revision
Review Comment:

This paper describes the effort from the authors to develop a tool to check ontologies for syntactic and semantic errors, using some form of reasoning, and the assessment of a number of popular ontologies, and their dependencies.

On originality, there is a certain effort in the paper to describe existing effort and similar approaches, the most overlapping of which is OOPS. While there are technical arguments regarding what is done differently, it is hard to understand why the approach presented could not have simply extend existing ones (especially OOPS) with specific kind of errors and reasoning. In the end, when compared to all the different techniques, from the ones based on reasoning and logical consistency to the ones based on patterns of errors, it is hard to see what this specific tool brings, beyond another combination of those.

Regarding significance of the results, the analysis of existing ontologies is certainly interesting and can have some use. I certainly believe that the paper should stick to presenting these results with clarity. I do however have some doubts that these are significant beyond being simply interesting. Indeed, as clearly said in the paper, the reasoning mechanism is not complete/correct and there is little said about the performance of the tool. There is some effort to discuss the results against the ones obtained by OOPS, but it is unclear how they can really be compared - are the errors more valid? more correct? more complete?

On the quality of writing, the paper is generally easy to follow, and reasonably well written. However, many parts are too vague and not precise enough (e.g. "a relatively large fragment", "most often user", "substantially increased", etc.)
In addition, the article does spend a lot of space describing what are (admittedly) well known challenges. I don't think it is necessary to go into so much effort to describe this. A similar comment can be made about the recommendation part, which remains a general discussion about the kind of practices that might improve ontology engineering without being particularly well grounded. Finally, I think the title is misleading - it is not about "fixing" things (mostly assessing them), or "unsound foundations" (most of what is being looked at is neither foundational or unsound, just bad practices). The article is not even about linked data, as it focuses on assessing ontologies (which are not, formally, schemata for datasets, as they are taken as full ontologies, not the parts that are used within specific datasets).