The OntoIOp Registry – a Dataset Supporting Ontology, Model and Specification Integration and Interoperability

Tracking #: 863-2073

Authors: 
Christoph Lange
Till Mossakowski
Oliver Kutz

Responsible editor: 
Aidan Hogan

Submission type: 
Dataset Description
Abstract: 
OntoIOp is an initiative for developing a standard for Ontology, Model and Specification Integration and InterOperability within the OMG (Object Management Group). (We will henceforth abbreviate “Ontology, Model and Specification” as OMS.) The OntoIOp working group, formed in 2011 and affiliated with the OMG since 2013, comprises a few dozen international experts representing all major communities on research and application of ontologies, formal modeling and formal specification. The primary tangible output of the OntoIOp work will be DOL, the Distributed OMS Language, a meta-language that gives the combination of different OMS languages a formal semantics and enables writing OMS libraries consisting of modules written in multiple OMS languages, and of mappings between such modules. The standardization of DOL's syntax and semantics is still in progress, there is already software that supports it, most prominently the Ontohub repository engine. While the DOL conformance of the most widely used standard OMS languages, particularly OWL, Common Logic and RDFS, and of their underlying logics and of translations between them, is being established in annexes to the standard, the DOL framework is designed to be extensible to any future OMS language. For this purpose, the standard provides for an open registry, to which the community can contribute descriptions of languages, logics and translations. In the interest of enabling interoperability, this registry is published as a linked open dataset. We present the initial population of the OntoIOp Registry, comprising 29 (sub)logics, 43 translations and 14 (sub)languages, each with rich descriptions, and the design of the LoLa ontology about logics and languages forming the core of its vocabulary, giving references to the literature based on which each part of the initial Registry and of LoLa were modeled. As use cases we outline how queries and inferences over the Registry can support applications for managing OMSs and OMS libraries. Looking into the near future, we draft the governance structures that will ensure sustainable maintenance of the OntoIOp Registry, and how large parts of it will be exported automatically rather than being maintained manually.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Torsten Hahmann submitted on 11/Nov/2014
Suggestion:
Major Revision
Review Comment:

Review Summary:
This paper describes the LOD that is developed as part of the OntoIOp registry. I'm confident that this will become an important linked data set in the future. While there is no doubt about the dataset's importance, improvements are necessary to make it easily accessible to a larger audience. The description of the dataset lacks sufficient clarity and detail to be useful to the novice user. The description of the dataset in Section 2 needs to be elaborated (adding detail and precision). Lists/tables and simple statistics could help address this issue (compare previous LOD papers in the journal). Furthermore, the figures need to be better tied in by explaining the depicted relationships and using them as examples in Sec. 2.
The authors remain vague on the maturity of the dataset, which is a concern, though it might be less pressing once sufficient detail is provided. The current state (what is there, what is missing) should be stated more explicit.

While some major rewriting/editing is necessary, I see no technical problems with the described data set. The raised issues about clarity/accessibility to the community at-large can be easily fixed. I support accepting this paper contingent on "the lack of detail and clarity" issue being addressed.

More details on the 3 evaluation criteria:

(1) Quality of the dataset.

I have no doubt that the relationships between the included logics and languages are correctly captured. However, the maturity/completeness of the dataset is an issue: as I understand it, not all mappings/relations between logics and languages are included yet. Be clear about which ones have been modeled and which are left for the future.

As a side issue: While one cannot reasonably expect the dataset to ever be complete, some mechanisms for the inexistence of mappings/translations could be helpful to differentiate between non-mappability and incomplete knowledge. I'm not sure whether that is within the scope of the OntoIOp registry.

(2) Usefulness (or potential usefulness) of the dataset.

The usefulness is not as clearly visible as would be desirable. Neither Hets nor Ontohub use the dataset, though potential future applications are hinted at. The authors do provide some example queries that help understand how the dataset may be useful by itself.

(3) Clarity and completeness of the descriptions

This is my chief concern. For a LOD description, I expect more detail than what is provided in Section 2. While the explanation of the provenance is sufficient, the explanation of what the dataset describes requires elaboration. This should be at a level that non-logicians can understand the basic ideas and use the LOD. For example, you need to explain the difference between logics and languages -- this will not be clear to most users (as often one language is associated with a single logic and vice versa).
Also, a better explanation of the intuitions behind "mapping", "translation", "serialization", "sublanguage", etc. are needed. Explain why mappings/translations are modeled as types as opposed to binary relations.

The current scope of the LOD is a bit vague, some lists/tables to summarize the dataset would be very helpful:
- explain the kind of items (maybe each of the "subdirectories" of the URLs) from http://purl.net/dol/registry that are reflected in the directories in http://purl.net/dol/
- how many of each of the types of items and relationships does the dataset include?
- list & briefly explain the kinds of mappings available, it wouldn't hurt to include the hierarchy of mapping relations from [13]
- what languages and logics are currently included? Given the manageable scope of 29 logics, 43 translations, and 14 languages, it would be easily to list them in a table/figure.

The figures could be more helpful by explaining what the depicted relations in Fig 1 and 2 are: most, I believe, are mappings (though I'm not sure whether sublanguage relations are mappings; at the beginning of Sec. 2 mappings are restricted to logics), but also serializations are included. Are the color coding of expressivity/decidability in Fig. 2 captured in the dataset?

Some minimal working example would be very helpful: one (or more) logics with one (or more) languages and two serializations as well as mappings to other logics/languages and metadata (showing how VoID and SKOS are utilized).

Lesser, though more general concerns about the described project/dataset:

1) The maturity/completeness of the LOD: the OntoIOp registry is still very much under development. While publication on the underlying research are very valuable, I'm note sure about the value of a description of the registry's LOD at this stage. It seems highly likely that the description will be outdated as soon as it is published. That defeats the purpose of describing the dataset to others for them to use/reuse.

2) ability for others to contribute: the purpose of the registry is to enable the community to contribute descriptions of languages, logics, and translations. However, for maintaining the registry, the authors propose to generate it automatically from Hets. This is counter to the desired openness: it would require others to first extend Hets instead of directly contributing to the directory/dataset. I personally think that the LOD should not be permanently tied to any specific software, which poses a significant barrier for the community to contribute. Other mechanisms for maintaining/updating the registry are needed.

Other things that need to be fixed in the final version:

- given that the paper is less than 5 pages in content, the abstract is unnecessarily long. It includes much background information (2nd paragraph, 1st sentence of 3rd paragraph, last paragraph) that should better be placed in the main part.

- p 4: last paragraph of Sec. 3 needs a rewrite to improve clarity

- if possible, the wealth of technical terminology should be reduced to what is essential. This is not supposed to be a description of the entire OntoIOp project, but of the dataset only.
You also need to more clearly separate and exlain differences between the DOL language, Lola vocabulary and the language of the OntoIOp registry at the beginning and clearly distinguish between what is a project (OntoIOp) vs. an artifact (registry, DOL, Lola)

- I can't quite appreciate the relevance of the example on p. 2 as it only uses the language and syntax statements that relate to the registry.

- The URl to Lola on p. 3 needs to be updated

Review #2
By Maria Poveda submitted on 28/Nov/2014
Suggestion:
Major Revision
Review Comment:

This paper decribes a dataset for logics, translations and languages descriptions. In general, I find the dataset really interesting and promising for combining and integrating information from different ontology registries and translation between logics. For the organization of the review I will follow the dimensions established by the type of submission:

(1) Quality of the dataset.

One of the main shortcomings of the paper is that the SPARQL endpoint where one could try the queries in the paper or others is not explicitly referenced from the text nor in http://ontoiop.org/. It should be included in Table 1.

A VoID description of the dataset is claimed to provide metadata from the dataset in page 3 however I haven't been able to find it either. It would be nice to have a footnote with it or include it also in Table 1. Adding the dataset description to a dataset registry (for example http://datahub.io/) and providing the reference to the resource entry in the datahub would be also advisable.

In the text it is said "the OntoIOp Registry, with LoLa being its main vocabulary, gets four stars" and "the OntoIOp Registry is unique in being a linked dataset covering the domain of OMS languages" considering that the linked part of the 5-star ranking is precisely the 5th one these two sentences seems contradictory. Either the dataset is linked, being 5-star, or it should establish links to other dataset to be possible to claim the second sentence as for "linked".

In general I would suggest reviewing the 5 star ranking and proof that the dataset is actually a linked dataset.

(2) Usefulness (or potential usefulness) of the dataset.

While thinking that the described dataset will be surely interesting and useful it would be welcome to read a bit more about motivation and potential uses apart from those in Ontohub and Hets. The current state of the paper gives me a feeling of the dataset were an ad-hoc development for these systems (Ontohub and Hets) and seeing some examples of uses out of this context would increase greatly the dataset value.

(3) Clarity and completeness of the descriptions.

Main concerns about clarity is the distinction between DOL and LoLa. It is not clear which ontology is used in dataset. At the beginning it seems like LoLa is the actual implementation of DOL for this dataset however in section 3 the URI of reference for LoLa contains "dol" and in the SPARQL query examples the prefix dol is used. In addition, the URI for LoLa gives a 404 errors (I tried to browse it several times in different weeks).

It would also be valuable including a diagram of the LoLa's main classes and properties as the current figures are example of instances from what I understand.

--- Other comments ---

Figure 2 is not referenced in the text. Is it nice to reference and describe within the text all figures and tables appearing in the paper.

In the first query in page 5 the selected variable is "?target-language" that do not appear in the query, in the query body it appears "?targetLanguage" instead.

I would like to see some concrete metrics about number of triples and outbound links to other datasets. The information about metrics in section 5 seems not clear about specific figures, see "around three times as many triples as the core dataset"

Typo: Section 5 "Thus, the expanded dataset has around three times as many triples as as.." --> only one "as"

Review #3
By Mathieu d’Aquin submitted on 29/Nov/2014
Suggestion:
Major Revision
Review Comment:

This paper presents the OntoIOP registry, which is a dataset based on an ad-hoc ontology for describing languages, the underlying logics, their serialisations and mappings between them. As a general comment, I thing the representation used is reasonably elegant, and I can see some value in having such a map of languages and logics available. However, it is very hard to extract, from the paper, how useful the dataset currently is, or what is its potential for impact. I also think that the a bit of additional work in improving access to the dataset, the scope of the content and the connections with external resources would help in improving and demonstrating the value of the dataset.

In more details

(1) Quality of the dataset

The representation of the languages, logics and mappings seem reasonable. The authors argue that there is no other ontology covering these aspects, and I indeed don't know any myself. It would be good however to include more information, in the related work section, about some other metadata descriptions for ontologies/information resources, that overlap to an extent with the one presented here. For example, a clear explanation of what is added by the ontology compared to OMV or to the schema used by common ontology repositories would be useful. Generally, a more complete comparison with other works that are not intended for the same task, but that overlap (e.g. ontology repositories, VoID, etc.) would be useful.

Although the information in the repository is modelled in a reasonable way, the content in itself is very small. That is not an issue in itself, but it certainly affects the usefulness as the scope of the dataset is very limited. One could argue that a dataset and a classification are different things, and that this is closer to a classification of languages.

The paper mentions that their are links to other datasets included, but going through a few resources, I couldn't find any. More details about that would certainly be needed.

Not directly related to the quality of the dataset, but to the ease of using it, it would have been good to also include other common forms of access to the data than resolving URIs to RDF, and a dump. A SPARQL endpoint as well as html documentation of the entities included (i.e. URIs resolving to human-readable documents too) would have been appreciated.

(2) Usefulness

The paper includes ideas about tools that could be using the dataset and an example query. This is interesting of course, but at the same time it is very hard to understand from what is written what is the real (current and potential) impact of the dataset. How much and how is it used currently? What is the demand for such information? How does the group plan to address this demand? The paper mentions sustainability, and honestly states that this is not a resolved issue. While this is understandable, and the case of many other datasets out there, it is also slightly worrying if the ambition for this is to become a reference point for others when describing resources related to languages, logics and their mappings. I can certainly see that happening, but again, as mentioned above, it would make the paper stronger if such an ambition was made explicit, with a clear view of how that might happen in the future if it has not done so yet.

As an aside, I believe that this issue could be helped by extending the scope of the dataset a bit, importing from existing repositories of ontologies (TONES, BioPortal, Watson, etc.) their metadata and enriching them with information about the language/logics they rely on. This could certainly demonstrate a practical application of the dataset, and generate a valuable resource to go with it.

(3) Clarity of the description

The paper is reasonably easy to read, and besides a few slightly surprising formulations, it is well written in my opinion. As already described above, I think however that several sections (related work, usefulness, technical aspects and interfaces to the datasets) should be elaborated further.