Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.
(1) originality
This work presents an empirical evaluation of quality metrics proposed by the authors in previous work. The novelty and research contributions of this work are limited.
(2) significance of the results
The sample sizes used in the conducted experiments are rather small, therefore it is very questionable how representative the results can be.
(3) quality of writing
The manuscript is not self-contained. Formal definitions of the metrics are not presented and the reader is pointed to previous work for important details of the approach.
This manuscript presents an empirical evaluation to assess quality metrics that were previously proposed by the authors. The experiments consist of three successive phases. First, an interview with four developers that have worked with Open Data sets is conducted; interviewees are inquired about positive and negative features of the datasets. Then, the set of proposed metrics are computed for all the datasets described in the first phase; metric values are compared to the answers of the developers. Lastly, discrepancies between developers’ answers and computed metrics are discussed with the developers to gather further explanations about the observed outcome.
The main strong point of this manuscript is that the motivation of this work is very clear. However, the description of the related work, proposed metrics, and experiments should be improved to make the paper clear and self-contained (see more details below).
In the Background and Motivation section, the authors refer to the “Five Star Linked Open Data” as “Five Star Open Data”. It is important to notice that Open Data is not necessarily *Linked* Open Data, so these two terms should not be used interchangeably. Furthermore, the authors enumerate Open Data characteristics that have been identified by the Open Government Group, however these characteristics are not aligned with the ones studied in this work.
The description of the metrics presented in Table is not very precise. Each metric is defined with either redundant or ambiguous terminology, therefore the following questions should be addressed in order to provide a more comprehensive definition of the metrics.
Q1 Why are these metrics tailored for Open Data and not other types of data?
Q2 Why were these specific characteristics chosen?
Q3 How is each metric computed exactly?
Q4 What is the range of each metric?
Q5 What does a “current value” signify in this context?
Q6 What is the “period of time referred by the dataset”?
Q7 What is a “meaningful value”? Is a value that is incorrect but coherent with the domain still considered meaningful?
Q8 What specific standards are taken into consideration to measure compliance?
Q9 How is the degree measured to which a dataset follows a standard?
Regarding the evaluation, the description of the experimental settings is not sufficient to allow for reproducibility. The design of the questionnaires, the process to conduct the interview, description of the interviewees and the datasets should be provided.
Q10 Besides the dataset characteristics, were there further instructions about the type of answers that should be provided by the interviewees?
Q11 How was traceability assessed by the interviewees? (Traceability does not appear in Table 2)
Q12 What is the level of experience of the interviewees with the Open Data sets?
Q13 How many of the interviewed developers worked with each individual dataset?
Q14 How many rows and attributes does each dataset contain?
Q15 Are the datasets used in other applications?
For the outcome of the three stages, the authors present a coarse-grained analysis of the results. The normalization aggregation and of the metrics is not well justified. I would recommend the authors to address the following questions and include further details of the obtained results .
Q16 What are the values obtained for each metric?
Q17 Why were the specific ranges <0.4., 0.4-0.6, > 0.6 chosen?
Q18 What function was used to aggregate the metrics in each characteristic?
Q19 Was there agreement among the interviewees regarding the negative/positive characteristics of each dataset? How much?
Q20 Why could the “Currentness” metrics not be computed (according to Table 2)?
In addition, it seems that interviewees were not clearly instructed how to asses each of the dimensions of the quality issues (questions P1-Q2, P1-Q4, P1-Q6 in Table 2). As indicated in Section 6, the interviewees had a different definition of “Completeness” than the one presented in Table 1. Therefore, the outcome of the interview cannot be directly compared with the outcome of the metrics, since they seem to measure different things.
The outcome of the empirical study provides interesting insights about developers’ experience when dealing with Open Data sets. However, as confirmed by the authors, these results are very premature and no generalizations can be obtained from this study.
In summary, the presented work tackles the interesting problem of quality assessment in Open Data. Unfortunately, even if the authors implement all the comments raised in the review, I consider that the research contributions of this work are not enough to be considered a journal publication. As a final remark, this manuscript seems not to fit in the topics of the special issue on “Quality Management of Semantic Web Assets” since the presented approach is not related to Semantic Web technologies but to Open Data.
|