RODI: Benchmarking Relational-to-Ontology Mapping Generation Quality

Tracking #: 1439-2651

Christoph Pinkel
Carsten Binnig
Ernesto Jimenez-Ruiz
Evgeny Kharlamov
Wolfgang May
Andriy Nikolov
Martin G. Skjaeveland
Alessandro Solimando
Mohsen Taheriyan
Christian Heupel
Ian Horrocks

Responsible editor: 
Guest Editors Quality Management of Semantic Web Assets

Submission type: 
Full Paper
Accessing and utilizing enterprise or Web data that is scattered across multiple data sources is an important task for both applications and users. Ontology-based data integration, where an ontology mediates between the raw data and its consumers, is a promising approach to facilitate such scenarios. This approach crucially relies on useful mappings to relate the ontology and the data, the latter being typically stored in relational databases. A number of systems to support the construction of such mappings have recently been developed. A generic and effective benchmark for reliable and comparable evaluation of the practical utility of such systems would make an important contribution to the development of ontology-based data integration systems and their application in practice. We have proposed such a benchmark, called RODI. In this paper, we present a new version of RODI, which significantly extends our previous benchmark, and we evaluate various systems with it. RODI includes test scenarios from the domains of scientific conferences, geographical data, and oil and gas exploration. Scenarios are constituted of databases, ontologies, and queries to test expected results. Systems that compute relational-to-ontology mappings can be evaluated using RODI by checking how well they can handle various features of relational schemas and ontologies, and how well the computed mappings work for query answering. Using RODI, we conducted a comprehensive evaluation of seven systems.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Patrick Westphal submitted on 01/Nov/2016
Minor Revision
Review Comment:

In the current revision of the paper 'RODI: Benchmarking Relational-to-Ontology Mapping Generation Quality' the authors addressed all my former suggestions. Besides this, the current version was extended in several sections which improves the understandability. However, these changes come with a few typos/phrasing issues I would like to see fixed:

> While the end-to-end setup allows
> almost any systems that map data between relational
> schemata and ontologies can participate in the bench-
> mark even if they do not support certain standards or
> languages, it also means that we cannot analyze map-
> ping rules or other intermediate artifacts of the process
> directly.

- "While the end-to-end setup allows [...] any systems [...] can participate [...]" --> 'to participate'
- Sentence quite long

> There is only a single default scenario, which is based
> on the original relational Mondial database. foreign
> keys, and 43,000 tuples).

- Missing opening parenthesis; part of the numbers missing

> to test the following crucial elements:
> - for all scenarios based on the canonical relational
> schema, all keys/foreign keys are the IRIs.

- Bullet point should end with semicolon (as the following points do)

> Those
> queries are highly complex compared to the ones in
> other scenarios, where query complexity refers to the
> number of different schema elements that need to be
> accessed.

I would recommend to rephrase this sentence because it is not immediately clear what this 'where' refers to. One might read it like 'The complexity of queries in the *other* *scenarios* is determined by the number of different schema elements'.

Review #2
By Christoph Lange submitted on 18/Nov/2016
Review Comment:

Dear authors, thank you very much for your precise feedback to the reviewers. The paper has matured further; all of the following is basically just nit-picking, which doesn't influence my clear recommendation to accept.

First, about your specific response to my concern regarding the expansion on "semantic heterogeneity". Your extended coverage now explains more appropriately why you consider inference on relational databases to be out of the scope of your work. Indeed, "calculations with an equivalent effect to some forms of logical inference", that's what I had in mind with my review of the previous version. I'm not sure whether your claim that "such applications of these features are not very common in practice to the best of our knowledge" is really true. Ideally you would provide some evidence based on, e.g., textbooks or survey papers from the field of relational databases. But in any case I would not consider it reasonable to ask you to provide even harder evidence by examining actual relational database data that include views, triggers, stored procedures, etc., as, in contrast to, say, LOD, they are close to impossible to find anywhere on the Web.

Commenting on your discussion with Reviewer 3 on the definition of "quality" I'd like to argue that there is no contradiction between defining "mapping quality as mapping utility w.r.t. a query workload posed against the mapped data" and "the notion
of multi-dimensional quality that is also frequently used in the literature". (BTW, while [54] and [8] are reasonable references to cite here, as they talk about mappings and build on this multi-dimensional definition of quality, but the more appropriate source for that definition is another reference that you have already, i.e. [58].) Is "utility w.r.t. a workload" actually a unidimensional measure, or are there really multiple aspects of it? In fact, Section 4.7 claims to introduce one single scoring function, but actually you are already defining two: "We […] observe a score that reflects the utility of the mappings […]. Intuitively, this score reports the percentage of successful queries for each scenario. However, in a number of cases, queries may return correct but incomplete results, or could return a mix of correct and incorrect results. In these cases, we consider per-query accuracy by means of a local per-query F-measure. Technically, our reported overall score for each scenario is the average of F-measures for each query test, rather than a simple percentile of successful queries." This could be seen as your "overall score" being a metric that aggregates two more basic metrics.

I consider your new Section 5.3 quite useful; it gives a clear impression of what it feels like to use RODI in practice. Thank you also for making Section 5.2 easier to understand thanks to examples.

Minor issues
* Section 5.2: misspelling: "interger"
* Reference [8]: space missing between "Mappings" and "to".

Review #3
By Anastasia Dimou submitted on 21/Nov/2016
Review Comment:

I would like to thank the authors for their detailed answers to my remarks and for having addressed most of my comments. I am satisfied with the clarification about the tasks and challenges distribution at Section 1.2, 4.1 and 5.3 and I would not insist on giving examples to illustrate the distribution of tested challenges if the authors believe it is not meaningful. I think that the paper can be accepted now.

I would only like to point that I still do not agree with the mapping quality definition introduced, but after having questioned it in all my reviews, I would consider more constructive just to accept the authors’ choice, as this is a detail, fundamental though. If “Utility has also been referred to as fitness for use in similar contexts in parts of the literature, e.g., [58]” as it is mentioned in the manuscript, why not just conform to existing terminology instead of introducing a new which is neither supported by a citation nor clearly and thoroughly defined now. Regarding the latter, the “mapping utility” definition relies on the “query workload” notion which, as a term, is not clarified in the manuscript. Therefore, it is still ambiguous which dimension/measure of query workload is in context. Moreover, even if it was not intended to deal with the multi-dimensional aspect of quality, an explicit comparison of the introduced quality term to other definitions of the term in the literature could point to the the dimension that is relevant after all.

A couple minors that fall in my attention while reading the last version of the manuscript:

Introduction: w.r.t → I would suggest that is rephrased to with respect to

“While the end-to-end setup allows almost any systems that map data between relational schemata and ontologies can participate in the benchmark even if they do not support certain standards or languages” → I think it needs some small rephrasing

“The RODI framework: The RODI software package, including all scenarios, has been implemented and made available for public download under an open source license.^2” → I would suggest that the footnote come before the full stop.