Review Comment:
This paper presents a set of metrics to evaluate the quality of Relational Database to RDF (RDB2RDF) mappings, with an attempt of formalizing an approach to apply the metrics and an evaluation based on applying the proposed quality metrics over three datasets.
Strong points
- In my opinion, this is an important topic. Quality of mappings in general (not just RDB2RDF) is a topic that deserves more research. With mapping standards such as R2RML, one can expect tools that help users create these mappings. An important feature will be the possibility of informing users about the quality of the mapping. Therefore, I believe that the impact of this type of work can be substantial.
- The authors present a wide variety of quality metrics (43) spanning 12 dimensions. These quality metrics are inspired by existing quality dimensions from Linked Data.
However, I am not able to recommend acceptance of this current paper for the following reasons:
Weak points
- W1 Definitions lacks well founded formalization (Section 4)
- W2 Describing only 7 of the 43 metrics in this paper is not sufficient. (Section 5 and 6)
- W3 Evaluation section presents the results of applying the quality metrics to 3 datasets instead of evaluating the quality metrics themselves (Section 7)
This is a journal paper. Technically, space shouldn’t be a limitation. All the details should be in this single document.
In what follows, I will provide detailed comments about the week points. I encourage the authors to pursue this work and I believe they can improve this paper quickly. I look forward to reviewing a revised version.
Comments on Section 3
------------------------------------
This section seems to that it should be titled “Overview of the Approach” instead of “Approach"
My main comment is that this section should be written in a way that a reader can grasp what is going on quickly without having to understand details. For example Figure 2 is not clear without the context of Section 3. What are scopes, sink, sink implementation?
My suggestion is to rewrite this section with a running example. As a reader, I would like to see and example of the problem and how it can be solved. This should be a running example throughout the paper.
Comments on Section 4
------------------------------------
With respect to W2, the definitions presented in this section are too long and lack formalism. These are some examples:
- Comments on Def 4.1
* What is a transformation description?
* Logical table is not defined (is it just a relation from the relational schema, and/or also a query. I believe it should be both).
* Quads should be it’s own definition ( subject, predicate, objects can be IRI, etc. ).
* There is too much prose. For example “for each relational data entry a variable q is instantiated to an RDF term based on an associated term constructor tc_q”. This should be formally described.
* The definition of a "view definition" v uses TC, and from what I understand TC is a set of “RDF term based on an associate term constructor”. But what does that even mean? Hence TC is never defined. Term Constructor should be its own definition.
* In conclusion, Def 4.1 is too long and not formal. This entire definition needs to be written in smaller definitions.
- “piece of data” —> This is not formal. What do you mean?
- I do not understand the definition of “quality assessment scope”. It reads to me like: “ the quality assessment scope of x can be either Sn which is a node scope, St which is the triple scope, …” You are defining “scope” with the word “scope” (which I do not know what it is). After that definition, you state that “scope is a categorization of the granularity a certain piece of data has”. Again, what do you mean by “piece of data”.
- “These scopes also correspond to the possible domains of the functions that do the actual computation of a quality score “: What functions/domains are you talking about? Later on I realize that you introduce what a quality score function is.
- what is a quality score function? example.
- Why use H for mapping. It is more intuitive to use M.
- Def 4.6, quality assessment, uses S, but I don’t know what it defines till the end: assessment sink. But I still don’t know what "assessment sink" means.
Bottom line is that I do not understand the terminology presented in section 4. I am lost and confused. I do not have a clear understanding of scope, quality score function, assessment sink. I do not feel that I am prepared to understand the rest of the paper very well (I have to figure things out on my own, sometimes make guesses). The reason why I am struggling to understand is that you are using terms that have not yet been defined. This section needs to be completely written to take in account a well founded formal definitions and written in a way that flows and doesn’t make the reader guess what the terms actually mean. Additionally, have a running example would be extremely useful.
Why is this section called “Methodology”? The majority of the content of this section is a set of definitions. The final three paragraphs describe the Methodology. Honestly, this methodology seems straightforward: define configurations, apply them and get the results. What is unique/novel about this? Am I missing something? What other ways could this be done?
Comments on Section 5
------------------------------------
There should be a separate discussion of how the quality assessments apply for ETLing RDB to RDF versus SPARQL to SQL. The discussion of SPARQL to SQL in Section 5 doesn’t seem to be in the right place. It should probably go after the quality dimensions have been discussed. For example, the statement: “Since these definitions provide a certain view of the underlying database, this affects quality aspects like completeness or relevance.” doesn’t have a lot of meaning because as a reader, I don’t know yet what “completeness” or “relevance” means.
If RDF is returned, are you considering sparql construct queries? What happens with select queries that returns solution mappings.
Can’t you license mappings for open source applications that rely on relational databases? those schemas are public (wordpress, drupal, etc)
What would be important is to provide a small example for each dimension.
Comments on Section 6
------------------------------------
In my opinion, this is where the main technical contribution of the work is. Unfortunately I was disappointed because only 7 of the 43 metrics were presented. Why 7? Why those 7? Actually, all 43 should be described in the paper. I would suggest to present at least one for each dimension (have a total of 13 in the paper) and the rest of them in an appendix. As a reader, I do want to see the math because I would like to reproduce the work.
In addition to presenting the metrics, I want to see examples (in R2RML because it is the standard). Table 3 is the big take away here but most of the descriptions are not 100% clear. An example for each one would be extremely helpful. I suggest to present at least 13 examples (choose one for each dimension) and leave the rest in the appendix.
Comments on Section 7
------------------------------------
The evaluation applies the quality metrics to three datasets. It’s interesting to learn about the quality issues that these 3 dataset have, but what would be more interesting (and useful) is to learn about the quality metrics themselves (not the result of applying them to a dataset). In my opinion, the goal of the evaluation isn’t to assess the quality of three datasets. As a reader, I want to learn what can be concluded about the quality metrics presented in this work. What are the author’s hypothesis? Are the quality metrics useful, relevant, reasonable? Which ones? Are there computation overheads to apply them? To come to these conclusions, you would probably have to still apply them to different datasets. But the results should be describe wrt to the quality metrics themselves and not just the datasets.
For example, Fig 5 opens up several questions. It seems that completeness and conciseness are dimensions that have relevance for RDB2RDF mappings. But if we look at consistency, there was barely anything. Why? Could we conclude that consistency is not an important quality metric for RDB2RDF mappings? Are there quality metrics that have more applicability to certain types of datasets? These are the questions that I have as a reader but unfortunately are not tackled at all in this section.
Additional questions:
- How is the service pinpointer implemented?
- "Our prototype currently lacks complete SQL query parsing and evaluation support which affects five of our metrics.” —> which 5 metrics?
- "Since the amount of data is far too much to be assessed as a whole, only a small portion of LinkedGeoData was chosen for evaluation.” —> Interesting, so is there a limitation due to computation. This is very important to know. Please discuss.
- Why present hardware if you are not presenting execution times? It seems that this could be a limitation. Is there anything interesting to learn from here? What if this is too slow to compute? Is it worth it?
Table 5: If there are limitations with the implementation of R2RLint, then how can I trust these results? Why even report them? Why not fix the software? Why report a number of errors per 100,000 triples instead of a percentage?
Why is dereferencing a quality that needs to be considered. RDB2RDF is used to generate RDF. If it’s deference able or not is an issue of the data, not of the process of generating the data… unless errors were introduced that make the dereferencing not work.
"The different results of the vocabulary completeness metrics show that only very few vocabularies were modeled completely” —> few vocabularies were modeled completely? or were mapped to? We are talking about mapping, not modeling, right? Otherwise, I’m confused.
Comments on Section 8
------------------------------------
This section is a brief summary, not a conclusion.
In a conclusion section, I would expect … conclusions (not a summary). What can we conclude from this work and the evaluation? For example:
- metrics A, B, C are the most relevant for dataset of type X.
- the most common quality issue for datasets of type X is A, B, C
"In this article a methodology for RDB2RDF quality assessments was developed and an overview of di- mensions to consider was given.” —> Going back to a comment made initially, it is not clear to me how this is a methodology.
|