Detecting Linked Data Quality Issues via Crowdsourcing: A DBpedia Study

Tracking #: 1293-2505

Maribel Acosta
Amrapali Zaveri
Elena Simperl
Dimitris Kontokostas
Fabian Flöck
Jens Lehmann

Responsible editor: 
Guest Editors Human Computation and Crowdsourcing

Submission type: 
Full Paper
In this paper we examine the use of crowdsourcing as a means to detect Linked Data quality problems that are difficult to uncover automatically. We base our approach on the analysis of the most common errors encountered in the DBpedia dataset, and a classification of these errors according to the extent to which they are likely to be amenable to crowdsourcing. We then propose and study different crowdsourcing approaches to identify these Linked Data quality issues, employing DBpedia as our use case: (i) a contest targeting the Linked Data expert community, and (ii) paid microtasks published on Amazon Mechanical Turk. We secondly focus on adapting the Find-Fix-Verify crowdsourcing pattern to exploit the strengths of experts and lay workers. By testing two distinct Find-Verify workflows (lay users only and experts verified by lay users) we reveal how to best combine different crowds’ complementary aptitudes in Linked Data quality issue detection. The results show that a combination of the two styles of crowdsourcing is likely to achieve more efficient results than each of them used in isolation, and that human computation is a promising and affordable way to enhance the quality of DBpedia.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Irene Celino submitted on 19/Jan/2016
Review Comment:

The authors addressed most if not all of my remaining remarks, therefore I'm happy to suggest final acceptance. Thanks a lot to the authors for their care (and patience) in addressing my suggestions, I think that the paper is much more readable and effective in this third - and hopefully final - version.

Review #2
By Harald Sack submitted on 14/Mar/2016
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

Thank you very much to the authors for carefully considering all of the issues raised in my last review!