Using microtasks to crowdsource DBpedia entity classification: A study in workflow design

Tracking #: 1340-2552

Qiong Bu
Elena Simperl
Sergej Zerr
Yunjia Li1

Responsible editor: 
Guest Editors Human Computation and Crowdsourcing

Submission type: 
Full Paper
DBpedia is at the core of the Linked Open Data Cloud and widely used in research and applications. However, it is far from being perfect. Its content suffers from many flaws, as a result of factual errors inherited from Wikipedia or glitches of the DBpedia Information Extraction Framework. In this work we focus on one class of such problems, un-typed entities. We propose an approach to categorize DBpedia entities according to the DBpedia ontology using human computation and paid microtasks. We analyzed the main dimensions of the crowdsourcing exercise in depth in order to come up with suggestions for workflow design and study three different workflows with automatic and hybrid prediction mechanisms to select possible candidates for the most specific category from the DBPedia ontology. To test our approach we run experiments on CrowdFlower using a dataset of 120 prevously unclassified entities, and evaluate the answers of the crowd. Our study shows that the microtask based free text approach achieved the highest precision at moderate cost compared to other workflows. However, each workflow has its merit and none of the worflows seems to perform exceptionally well on entities that the DBpedia Extraction Framework fails to classify. We discuss these findings and their potential implications for the design of effective crowdsourced entity classification in DBpedia and beyond.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Florian Daniel submitted on 21/Apr/2016
Minor Revision
Review Comment:

The authors did a significant job in re-structuring the whole paper, and re-considering the experimental settings and assessment of their approach. In this revision, most of the concerns raised in the previous review were addressed. However, there are still some points that were not really addressed or at least not in an appropriate way:

- I still miss a good formalization of the problem. Somehow the problem is explained in Section 1 (and partly in Section 2), but it would be good to have a formal definition of the target problem.

- The authors seem to suggest (in the answer letter) that T1 is iterative: first propose, then vote. But I did not see a good explanation and elaboration of this in the paper. I would suggest the authors to elaborate more on this.

- Can the authors also elaborate more on how they apply gold data to the task design with free text inputs? It is not clear how the authors do this in their task design, especially considering that workers may provide more than one free text (in this case, against which of the words from the list of words is the gold data compared to?). How do you define your gold data in these cases, in the specific case of Crowdflower? This need to be elaborated more.

Other comments:
- Proof reading is needed in order to make sure that the paper makes good use of the language.

Review #2
By Marco Brambilla submitted on 26/Apr/2016
Minor Revision
Review Comment:

The paper addresses the problem of multi-level classification (taxonomy-based) of entities by comparing different workflows, based on automatic prediction and crowdsourcing. The paper features an interesting set of experiments.

(1) originality
Although it's not disrupting in terms of innovation, it could provide some small further insight with respect to the existing studies on the organization of crowd work.

(2) significance of the results
Results are significant in the context of the specific problem addressed by the paper, although of the approach is probably not so much generalizable.

(3) quality of writing.
The current version of the manuscript has improved in readability, coherency and organization.
The main criticism I have on the current version of the work is that the authors insist in dedicating a significant section of the related work to the GWAP field, which is basically irrelevant in the current context, as they state also in their rebuttal letter ("Whilst we agree with the Reviewer that improvements can be made such as involving gamification techniques or query execution plan optimization, those techniques often are not natively supported by crowdflower. We reduced our approaches to that not needed external applications ...").
Viceversa, the analysis is missing all (or most) of the works that study different crowdsourcing workflow planning strategies (e.g., considering crowd quality, performance, adaptation and reactivity, multi-step processes, spam detection, and so on). I think this could deserve a relevant section in the related work, beyond the part applying AI techniques.

Some minor aspects need to be addressed:
-"including glitches in the extraction process" --> there is no substantial evidence of this in the paper. The authors should either motivate and show some examples, or drop the comment.
- in the formula: Cost(annotation) = Cost(prediction)+ Cost(detection) + Cost(correction), the term "cost" should be specified more clearly. Is it monetary? (in this case, you should clarify that the cost of prediction may come from paid API invocation) or number of human tasks only? (it seems so based on the reported results)
- in section 5.4, you should clarify if you actually applied spam prevention or not.