A comprehensive quality model for Linked Data

Tracking #: 1391-2603

Filip Radulovic
Nandana Mihindukulasooriya
Raúl García-Castro
Asunción Gómez-Pérez

Responsible editor: 
Guest Editors Quality Management of Semantic Web Assets

Submission type: 
Full Paper
With the increasing amount of Linked Data published on the Web, the community has recognised the importance of the quality of such data and a number of initiatives have been undertaken to specify and evaluate Linked Data quality. However, these initiatives are characterised by a high diversity in terms of the quality aspects that they address and measure. This leads to difficulties in comparing and benchmarking evaluation results, as well as in selecting the right data source according to certain quality needs. This paper presents a quality model for Linked Data, which provides a unique terminology and reference for Linked Data quality specification and evaluation. The mentioned quality model specifies a set of quality characteristics and quality measures related to Linked Data, together with formulas for the calculation of measures. Furthermore, this paper also presents an extension of the W3C Data Quality Vocabulary that can be used to capture quality information specific to Linked Data, a Linked Data representation of the Linked Data quality model, and a use case in which the benefits of the quality model proposed in this paper are presented in a tool for Linked Data evaluation.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Gavin Mendel-Gleason submitted on 01/Jul/2016
Review Comment:

(1) originality

The authors of this paper have created a quality model for linked data which is an extension of the ISO 25012 data quality model altered to include aspects which are unique to linked data quality. The work is based on the Zaveri et al. data quality work which is now enjoying use in the evaluation of linked data quality. They extend this with some additional measures. The main contributions of the paper appear to be in segregating base measures from derived measures. It also presents a conceptual model which can itself be expressed as linked data to describe data quality measures. The creation of such data quality models specified in linked data ontologies are doubtless of some importance for linked data quality. However, the authors did not suitably motivate the reasoning for their measures or why the particular model should be useful to practitioners.

The paper makes the case for the utilisation of a linked data measure ontology explicit and provides an example use case which helps the reader to understand how the model would be used in practice.

(2) significance

The linked data community will be well served by standardised formats for the reporting of quality measures. It remains to be seen if this format will be one that becomes widely used. More work needs to be done on assessment of the model over a larger number of questions before this can be answered definitively but perhaps this can be a direction for future work.

(3) quality of writing

The quality of writing in the paper is high and the main points are clearly expressed.

Review #2
By Jeremy Debattista submitted on 01/Jul/2016
Minor Revision
Review Comment:

The authors have modified their manuscript and addressed most of the reviewers' comments. The new preliminaries section looks better, albeit it does require some more modifications and structuring, whilst their contributions are better positioned against the work of Zaveri et al.

In Section 2.3 I suggest to focus on meta-models (DQV and daQ), whilst moving the rest to Section 2.1 or a separate section "Related Work". In this model, my major concern is still related to the "Granularity". My remarks from the previous review still stand, that is; granularity can still be represented by the vocabularies that this model extends.

It would also be helpful for the reader if the authors create a section or appendix with a better TBox-ABox example (rather than Figure 7) and sample SPARQL queries for this model.

Regarding the use case tool, I appreciate the effort done by the authors to create a tool to accompany such a paper, but for the reader, the tool might be a bit difficult to use, thus I suggest that a more practical solution is found (maybe a docker with a pre-installed local web interface?).

Some other questions that the authors might clarify:
(1) Why was this model evaluated on just a few selected metrics and just one data source?
(2) Considering that only 2 slices of DBpedia were evaluated, as a data consumer what will I learn about the overall quality of DBpedia, considering that quality LDQM resources are now available online?

The manuscript is well written, but there are some typos such as:
Zavery -> Zaveri (section 3)
dereferenciable -> dereferenceable (section 5)
dereferenciability -> dereferenceability (section 5)

Please proof-read the manuscript once more.