Quality Assurance of RDB2RDF Mappings

Tracking #: 1276-2488

Patrick Westphal
Claus Stadler
Jens Lehmann

Responsible editor: 
Pascal Hitzler

Submission type: 
Full Paper
Since datasets in the Web of Data stem from many different sources, ranging from automatic extraction processes to extensively curated knowledge bases, their quality also varies. Thus, significant research efforts were made to measure and improve the quality of Linked Open Data. Nevertheless, those approaches suffer from two shortcomings: First, most quality metrics are insufficiently formalised to allow an unambiguous implementation which is required to base decision on them. Second, they do not take the creation process of RDF data into account. A popular extraction approach is the mapping of relational databases to RDF (RDB2RDF). RDB2RDF techniques allow to create large amounts of RDF data with only few mapping definitions. This also means that single errors in an RDB2RDF mapping can affect a considerable portion of the generated data. In this paper we present an approach to assess RDB2RDF mappings also considering the actual process of the RDB to RDF transformation. This allows to detect and fix problems at an earlier stage before resulting in potentially thousands of data quality issues in published data. We propose a formal model and methodology for the evaluation of the RDB2RDF mapping quality and introduce actual metrics. We evaluate our assessment framework by applying our reference implementation on different real world RDB2RDF mappings.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Christoph Pinkel submitted on 03/Jan/2016
Minor Revision
Review Comment:

The authors describe a framework for quality scoring of linked data resulting from RDB2RDF mappings, e.g., R2RML or the author’s own SML. While the framework and metrics in general are based on previously established quality measures for linked data, all metrics proposed with the framework are backed by an original formal model. In addition, the authors apply their metrics on three datasets in different application domains and discuss their findings.

Overall, the work addresses a relevant and current problem. The paper also includes three separate contributions, namely (1) the selection, adoption and extension of existing quality metrics from linked data in general to linked data produced by RDB2RDF mappings, including a prototypical implementation (R2RLint), (2) a formal model for quality metrics, and (3) a study on existing datasets, which serves to demonstrate the utility of the proposed framework, but could also point out certain issues with open data.

My only concern is about the clarity and coherence of how these contributions are presented together. However, this concern touches several (partially central) aspects of the paper. While each contribution makes sense, I find it difficult to understand the big picture and to follow the story:

(1) This paper is about the quality of RDB2RDF mappings (or rather the quality of linked data resulting from it?), but in parts of the paper it is hard to tell whether the discussion is about the mappings, or rather about linked data in general — and thus it is difficult to tell specific contributions from pre-existing linked data quality measures (i.e., to identify the parts that are new/adjusted and specific to RDB2RDF as opposed to linked data in general).

(2) My feeling is that the motivation is very general and rather abstract. What are the contributions actually good for? *Why* are those specific extensions of quality metrics for RDB2RDF an actual advantage and why do we need them? Why is their formalization important (beyond the obviously positive side effect of improving clarity when writing them down)? How does the evaluation play in and what does it really get us here (obviously, it is not evaluating the proposed framework as such but rather applying it)?

(3) I get the impression that the glue between the three contributions mentioned in the introduction is actually the R2RLint system (which appears to be really mature): having a tool to measure RDB2RDF quality apparently might help the community to build better mappings, such a system requires quality metrics (contrib. 1), which should be transparent and clearly defined (contrib. 2), and need to produce some useful output in practice (feasibility demonstrated in contrib. 3). Yet, the system is not mentioned as a contribution, and plays no real part before Sec. 7. If my impression is correct, then this hiding away of the system is key to my problem with following the story. If my impression is wrong then I might be missing a crucial point about why the contributions matter (c.f. again previous item (2)).

(4) To my understanding, the parts of the metrics that are specific to RDB2RDF are neither backed empirically nor from literature. It would therefore be highly important to understand the authors’ reasoning on their newly introduced quality metrics. Are they inevitable? Are there limitations? If so, when would I apply them and when not? Take Metric 4 as an example: does it *always* make sense to require the preservation of functional dependencies? E.g., take a typical case of e-mail addresses (or other contact details) with a DB application that has no need for alternate contact details. They’ll be probably modeled so they functionally depend on a person’s ID. Would it then be sensible to *require* e-mail addresses to be a functional property of a person in OWL?

(6) I’m unsure about the “exemplary” selection of metrics out of the full set of metrics from the technical report. Apparently, also several of the metrics that are specific to RDB2RDF (i.e., newly introduced) have been left out. It leaves me wondering if significant parts of the first contribution have been skipped here (which should not be necessary in a full journal paper), or if this first contribution might be less important to the paper than I thought?

I would hope for the following clarifications:
- a clearer and more concrete statement of the overall purpose and why all the contributions matter for that purpose somewhere early on (e.g., introduction).
- a clear distinction, which of the presented metrics are actually new (or massively changed) and specific to RDB2RDF, and which ones are more or less a straight-forward translation from existing linked data metrics.
- more detailed discussion of the reasoning behind new metrics

In addition, related work could benefit from a brief discussion of other approaches considering mapping quality. While the authors extensively survey existing literature on linked data quality, they do not cover previous considerations on RDB2RDF mapping quality. E.g., Sequeda et al. [1] have discussed some theoretically desirable properties of mappings, Console and Lenzerini have recently discussed mapping quality w.r.t. consistency questions [2].

- a few commas before “which” are missing (e.g., abstract)
- there seems to be a mix of BE and AE in the paper (e.g., formalised, but utilized or serialized, etc.)
- neglectable -> negligible?

[1] Sequeda, J.F. et al. On Directly Mapping Relational Databases to RDF and OWL. In: WWW 2012
[2] Console, M. and Lenzerini, M. Data Quality in Ontology-Based Data Access: The Case of Consistency. In AAAI, 2014.

Review #2
Anonymous submitted on 22/Jan/2016
Review Comment:

This paper proposes an approach to evaluate the quality of RDB2RDF mappings. Specifically, 43 metrics are defined by considering context information in different scopes. Although the metrics defined in the paper are mostly extensions of the existing work, detailed experimental analysis of these metrics are also important. This paper is well written and easy to understand.

The authors have addressed most of my concerns about the first version of their paper, and I am satisfied with the current version.

Review #3
Anonymous submitted on 31/Jan/2016
Minor Revision
Review Comment:

Review of: “Quality Assurance of RDB2RDF Mappings”

The authors of this paper propose a formal methodology by which we can evaluate the quality of Linked Data in the earliest phase of LD generation from Relational Data Base. They also provide some metrics which could have influence on quality of LD in this phase.

Positive Points:

• This idea is interesting in my view because as the authors mentioned the phase in which the Relational Data Base is mapping onto the RDFs is a very sensitive and fault prone phase. In this phase even a single error can cause many problems that lower the quality of LD in afterward assessments. In addition to the paper being interesting, the idea of this paper is quite unique considering the previous works in LD quality assurance.

• Experimental part of the paper is well described and contains enough details.

Negative Points:

• The most undesirable aspect of this paper is that the authors did not describe their work in section 3 in a way that it will be easy for people to approach it. In particular, there are such phrases as “quad pattern” and “word constructor” that need at least a brief description in this section to make this section more approachable.

• In section 4, authors use the example from Figure1 to make their methodology more clear; however, because of a lack of some description or some more detailed information, clarity is still lacking. For example, in section 4.3, there should be a clear example to give some sense and ideas about how this scoping works. Particularly, the scope that we choose to evaluate the LD is very important and effective in this methodology; hence, the authors need to make this part more clear.

• Although I do not see any objection on choosing a purely theoretical, an experimental evaluation (empirical and intuitive) would be desirable. However, the contribution of the paper is also solid without such an evaluation.

• In section 5, they are describing which metrics they consider and which they don’t. However in section 6 they only select 7 of them. Rationales for this selection need to be given.

Review #4
By Wei Hu submitted on 01/Feb/2016
Review Comment:

I appreciate the authors' responses to my review. The questions that I asked have all been revised or answered. For my biggest concern, the authors explained that the main contribution of the paper is the formal definition of the metrics, although several metrics might not be specific to RDB2RDF. In the future, it may be interesting to think some more difficult quality assessment questions, like how natural is the transformation. This question is similar to database design, different persons may design different database schemas for the same data. However, some design is more natural or intuitive than another. Anyway, this is just a hope.

Review #5
Anonymous submitted on 16/Feb/2016
Major Revision
Review Comment:

This paper presents a set of metrics to evaluate the quality of Relational Database to RDF (RDB2RDF) mappings, with an attempt of formalizing an approach to apply the metrics and an evaluation based on applying the proposed quality metrics over three datasets.

Strong points
- In my opinion, this is an important topic. Quality of mappings in general (not just RDB2RDF) is a topic that deserves more research. With mapping standards such as R2RML, one can expect tools that help users create these mappings. An important feature will be the possibility of informing users about the quality of the mapping. Therefore, I believe that the impact of this type of work can be substantial.
- The authors present a wide variety of quality metrics (43) spanning 12 dimensions. These quality metrics are inspired by existing quality dimensions from Linked Data.

However, I am not able to recommend acceptance of this current paper for the following reasons:

Weak points
- W1 Definitions lacks well founded formalization (Section 4)
- W2 Describing only 7 of the 43 metrics in this paper is not sufficient. (Section 5 and 6)
- W3 Evaluation section presents the results of applying the quality metrics to 3 datasets instead of evaluating the quality metrics themselves (Section 7)

This is a journal paper. Technically, space shouldn’t be a limitation. All the details should be in this single document.

In what follows, I will provide detailed comments about the week points. I encourage the authors to pursue this work and I believe they can improve this paper quickly. I look forward to reviewing a revised version.

Comments on Section 3

This section seems to that it should be titled “Overview of the Approach” instead of “Approach"

My main comment is that this section should be written in a way that a reader can grasp what is going on quickly without having to understand details. For example Figure 2 is not clear without the context of Section 3. What are scopes, sink, sink implementation?

My suggestion is to rewrite this section with a running example. As a reader, I would like to see and example of the problem and how it can be solved. This should be a running example throughout the paper.

Comments on Section 4
With respect to W2, the definitions presented in this section are too long and lack formalism. These are some examples:

- Comments on Def 4.1
* What is a transformation description?
* Logical table is not defined (is it just a relation from the relational schema, and/or also a query. I believe it should be both).
* Quads should be it’s own definition ( subject, predicate, objects can be IRI, etc. ).
* There is too much prose. For example “for each relational data entry a variable q is instantiated to an RDF term based on an associated term constructor tc_q”. This should be formally described.
* The definition of a "view definition" v uses TC, and from what I understand TC is a set of “RDF term based on an associate term constructor”. But what does that even mean? Hence TC is never defined. Term Constructor should be its own definition.
* In conclusion, Def 4.1 is too long and not formal. This entire definition needs to be written in smaller definitions.

- “piece of data” —> This is not formal. What do you mean?

- I do not understand the definition of “quality assessment scope”. It reads to me like: “ the quality assessment scope of x can be either Sn which is a node scope, St which is the triple scope, …” You are defining “scope” with the word “scope” (which I do not know what it is). After that definition, you state that “scope is a categorization of the granularity a certain piece of data has”. Again, what do you mean by “piece of data”.

- “These scopes also correspond to the possible domains of the functions that do the actual computation of a quality score “: What functions/domains are you talking about? Later on I realize that you introduce what a quality score function is.

- what is a quality score function? example.

- Why use H for mapping. It is more intuitive to use M.

- Def 4.6, quality assessment, uses S, but I don’t know what it defines till the end: assessment sink. But I still don’t know what "assessment sink" means.

Bottom line is that I do not understand the terminology presented in section 4. I am lost and confused. I do not have a clear understanding of scope, quality score function, assessment sink. I do not feel that I am prepared to understand the rest of the paper very well (I have to figure things out on my own, sometimes make guesses). The reason why I am struggling to understand is that you are using terms that have not yet been defined. This section needs to be completely written to take in account a well founded formal definitions and written in a way that flows and doesn’t make the reader guess what the terms actually mean. Additionally, have a running example would be extremely useful.

Why is this section called “Methodology”? The majority of the content of this section is a set of definitions. The final three paragraphs describe the Methodology. Honestly, this methodology seems straightforward: define configurations, apply them and get the results. What is unique/novel about this? Am I missing something? What other ways could this be done?

Comments on Section 5

There should be a separate discussion of how the quality assessments apply for ETLing RDB to RDF versus SPARQL to SQL. The discussion of SPARQL to SQL in Section 5 doesn’t seem to be in the right place. It should probably go after the quality dimensions have been discussed. For example, the statement: “Since these definitions provide a certain view of the underlying database, this affects quality aspects like completeness or relevance.” doesn’t have a lot of meaning because as a reader, I don’t know yet what “completeness” or “relevance” means.

If RDF is returned, are you considering sparql construct queries? What happens with select queries that returns solution mappings.

Can’t you license mappings for open source applications that rely on relational databases? those schemas are public (wordpress, drupal, etc)

What would be important is to provide a small example for each dimension.

Comments on Section 6

In my opinion, this is where the main technical contribution of the work is. Unfortunately I was disappointed because only 7 of the 43 metrics were presented. Why 7? Why those 7? Actually, all 43 should be described in the paper. I would suggest to present at least one for each dimension (have a total of 13 in the paper) and the rest of them in an appendix. As a reader, I do want to see the math because I would like to reproduce the work.

In addition to presenting the metrics, I want to see examples (in R2RML because it is the standard). Table 3 is the big take away here but most of the descriptions are not 100% clear. An example for each one would be extremely helpful. I suggest to present at least 13 examples (choose one for each dimension) and leave the rest in the appendix.

Comments on Section 7

The evaluation applies the quality metrics to three datasets. It’s interesting to learn about the quality issues that these 3 dataset have, but what would be more interesting (and useful) is to learn about the quality metrics themselves (not the result of applying them to a dataset). In my opinion, the goal of the evaluation isn’t to assess the quality of three datasets. As a reader, I want to learn what can be concluded about the quality metrics presented in this work. What are the author’s hypothesis? Are the quality metrics useful, relevant, reasonable? Which ones? Are there computation overheads to apply them? To come to these conclusions, you would probably have to still apply them to different datasets. But the results should be describe wrt to the quality metrics themselves and not just the datasets.

For example, Fig 5 opens up several questions. It seems that completeness and conciseness are dimensions that have relevance for RDB2RDF mappings. But if we look at consistency, there was barely anything. Why? Could we conclude that consistency is not an important quality metric for RDB2RDF mappings? Are there quality metrics that have more applicability to certain types of datasets? These are the questions that I have as a reader but unfortunately are not tackled at all in this section.

Additional questions:

- How is the service pinpointer implemented?
- "Our prototype currently lacks complete SQL query parsing and evaluation support which affects five of our metrics.” —> which 5 metrics?
- "Since the amount of data is far too much to be assessed as a whole, only a small portion of LinkedGeoData was chosen for evaluation.” —> Interesting, so is there a limitation due to computation. This is very important to know. Please discuss.
- Why present hardware if you are not presenting execution times? It seems that this could be a limitation. Is there anything interesting to learn from here? What if this is too slow to compute? Is it worth it?

Table 5: If there are limitations with the implementation of R2RLint, then how can I trust these results? Why even report them? Why not fix the software? Why report a number of errors per 100,000 triples instead of a percentage?

Why is dereferencing a quality that needs to be considered. RDB2RDF is used to generate RDF. If it’s deference able or not is an issue of the data, not of the process of generating the data… unless errors were introduced that make the dereferencing not work.

"The different results of the vocabulary completeness metrics show that only very few vocabularies were modeled completely” —> few vocabularies were modeled completely? or were mapped to? We are talking about mapping, not modeling, right? Otherwise, I’m confused.

Comments on Section 8
This section is a brief summary, not a conclusion.

In a conclusion section, I would expect … conclusions (not a summary). What can we conclude from this work and the evaluation? For example:
- metrics A, B, C are the most relevant for dataset of type X.
- the most common quality issue for datasets of type X is A, B, C

"In this article a methodology for RDB2RDF quality assessments was developed and an overview of di- mensions to consider was given.” —> Going back to a comment made initially, it is not clear to me how this is a methodology.