Review Comment:
In this paper, the authors present a quality model for Linked Data based on the ISO 25012 data quality model, and formalise a classification of different quality measures. They also extend a W3C Data Quality Vocabulary (DQV) in order to serve the proposed quality model better. The proposed model was implemented in a tool used to evaluate Linked Data. On the whole the paper is well written with some small typos which can easily be fixed.
=== Section 2 ===
In the related work section, I would have expected more research about current quality models such as the W3C Data Quality Vocabulary, daQ or Fürber’s work [1] - all of which were mentioned in Section 4.2. I would rename the section to Preliminaries, rather than Related Work, as what was described helps the reader to understand the rest of the paper better. I was a bit confused with the phrase “To the best of our knowledge, there is no clearly defined quality model for Linked Data”. I disagree with this statement as quality models (or better meta-models) for Linked Data have been described in DQV and daQ. If the authors’ idea of quality model refers to the idea of having a taxonomy with a number of quality measures that can be reused, then I do not agree about the “clearly defined” part, as quality measures can be recommended but others can have different perspectives of the same measures. Therefore, my understanding of “quality model” is a conceptual meta-model that enables the description of quality related information regarding some aspect of Linked Data.
=== Section 3 ===
In this section, the authors present a quality model for Linked Data, adopting the ISO terminology. The separation of the Linked Data aspects is interesting, though this seems to be similar to the categories defined in Zaveri et al [2]. The difference between the two aspects’ categorisation is that in this paper, the authors distinguish between data quality characteristics and infrastructure quality characteristics. I don’t consider the serialisation aspect as part of the inherent group, but it is more suited in the infrastructure group. The serialisation per se does not really affect the data characteristics, but it does affect other issues related to infrastructure for example lack of interoperability, syntactic errors etc. The base and derived measures were well explained though it would have been better if the authors identified their real contributions (against referenced work in [2]) explicitly. This section seems to be an extended version of a number of metrics defined in [2] with more details, and with an additional mapping to the ISO quality model.
=== Section 4 ===
In this section, the authors present a conceptual hierarchical model for the LD quality model and extensions to the DQV model. In the conceptual model, the authors show how quality should be represented. This looks a lot like DQV and daQ. Following a closer look into the proposed model, I found out that some introduced concepts and properties were unnecessary. For example, why is a ranking function required in a metric? Ranking should be separated from the metric itself, as it is finally the consumers (or whoever wants to explore quality models) who decides how to rank different LD aspects. The “Granularity” concept is also unnecessary, as the dqv:QualityMeasure is equivalent to daq:Observation which has the “computedOn” property which seems to cover any assessed resource. On the other hand, extensions related to the semantics of a metric such as the automation level, and the expected duration (although this can differ between different machines and what is being assessed) are really useful.
=== Section 5 ===
It is always a plus to implement such a model in a use case, the problem is that the tool did not work for me. I tried to assess a dbpedia resource, and the results did not appear within 5 minutes. Also, I think users should be left in liberty regarding to what metrics should be assessed - after all quality is commonly defined as “fitness for use”.
=== Section 6 ===
I acknowledge the authors attempt to evaluate the quality model in the discussion section, although a thorough evaluation is required. For example, how practical is the model? To what extent could it be used? Are there any applications (apart from LD Sniffer) using both the extension of the model and the defined taxonomy?
=== Final Remarks ===
Although this paper has some interesting aspects, in my opinion this work lacks originality. Whilst understandably different, there are already a number of Linked Data Quality taxonomies available related to Linked Data, e.g [1] and [3]. Also, reading parts of this paper felt like reading [2] and its references. I suggest that the authors focus on new quality measures, rather than re-explain what was described in [2] (and its references).
The conceptual model is very similar to that described in DQV and daQ. I am not sure whether the LD community needs another conceptual ontology (or extensions) with different terminologies and thus suggest that unless necessary, the authors should stick to the existing terminology described in the standard DQV. The contribution here seems to be the small extension made to the DQV ontology. My question is, how will these new extensions fit in existing quality assessment frameworks? Also, were there any problems in describing these quality measures in DQV or daQ? This kind of exercise would be really interesting in such a paper as then the reader would really understand the importance of the proposed extensions. These extensions lack supporting evidence for why they are required. Regarding the extension, I wholeheartedly agree with the introduction of the “Assessment Technique” concept, and the new sub-types of dqv:Metric (given that eventually they are supported in existing quality assessment frameworks), but on the other hand the introduction of QMO and Eval duplicates the efforts in DQV and daQ.
Minor Comment:
I don’t know what kind of referencing system the authors used, but generally I think references should be in alphabetical order.
———
[1] Christian Fürber and Martin Hepp. 2011. Towards a Vocabulary for Data Quality Management in Semantic Web Architectures. In Proceedings of the 1st International Workshop on Linked Web Data Management (LDWM). ACM, New York, NY, USA, 1–8. DOI:http://dx.doi.org/10.1145/1966901.1966903
[2] Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment for linked data: A survey. Semantic Web – Interoperability, Usability, Applicability (2014)
[3] Jeremy Debattista, Christoph Lange, and Sören Auer. 2014. Representing Dataset Quality Metadata using Multi-Dimensional Views. In SEMANTiCS.
|