Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

Empirical Methodology for Crowdsourcing Ground Truth

Submitted by Anca Dumitrache on 04/17/2018 - 07:58

Tracking #: 1887-3100

A new version of this paper is available

Authors:

Anca Dumitrache

Oana Inel

Benjamin Timmermans

Carlos Ortiz

Robert-Jan Sips

Lora Aroyo

Chris Welty

Responsible editor:

Guest Editors Human Computation and Crowdsourcing

Submission type:

Full Paper

Abstract:

The process of gathering ground truth data through human annotation is a major bottleneck in the use of information extraction methods for populating the Semantic Web. Crowdsourcing-based approaches are gaining popularity in the attempt to solve the issues related to volume of data and lack of annotators. Typically these practices use inter-annotator agreement as a measure of quality. However, in many domains, such as event detection, there is ambiguity in the data, as well as a multitude of perspectives of the information examples. We present an empirically derived methodology for efficiently gathering of ground truth data in a diverse set of use cases covering a variety of domains and annotation tasks. Central to our approach is the use of CrowdTruth metrics that capture inter-annotator disagreement. We show that measuring disagreement is essential for acquiring a high quality ground truth. We achieve this by comparing the quality of the data aggregated with CrowdTruth metrics with majority vote, over a set of diverse crowdsourcing tasks: medical relation extraction, Twitter event identification, news event extraction and sound interpretation. We also show that an increased number of crowd workers leads to growth and stabilization in the quality of annotations, going against the usual practice of employing a small number of annotators.

Full PDF Version:

swj1887.pdf

Revised Version:

Empirical Methodology for Crowdsourcing Ground Truth

Previous Version:

Empirical Methodology for Crowdsourcing Ground Truth

Tags:

Reviewed

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Gerhard Wohlgenannt submitted on 03/May/2018

Suggestion:
Accept

Review Comment:

After a quick check of the new version, I appreciate the improvements to the paper
by the authors, and stick with my previous recommendation (accept).

Minor:
Typos:
p14: "scores perform provide better ground truth"

Review #2

Anonymous submitted on 11/Jun/2018

Suggestion:
Minor Revision

Review Comment:

I reviewed the new version of the paper, I read the authors' response letter (as well as the other previous reviews). I was reviewer number 4 on the previous version.

The authors definitely improved the result discussion and made their claims more clear and fair, addressing some of my comments; they added some more details and clarifications here and there; and they turned their claims on semantic web into claims on contribution to knowledge base curation.

However, the authors say that the novelty of this paper w.r.t. their previous publications lies in the application of CrowdTruth to open-ended tasks; however, they did not need to modify anything in their methodology on closed tasks to apply it to open-ended tasks, so I fail to see the novelty/originality.

Moreover, I still have major doubts on the proposed evaluation: all results are "biased" towards the cases with "multiple truths", because F1 of majority voting is by-design penalized by the lower values of recall w.r.t. those of CrowdTruth; the choice of taking as ground truth what they call "trusted judgement" (even with the addition of the appendix) is still very questionable and I disagree that it was the only way to run such an evaluation; on the contrary, I would be curious to see the results of comparing CrowdTruth to majority voting considering pure expert judgement as ground truth.

Therefore, I suggest a minor revision in which the authors extend the paper with the results of comparing CrowdTruth to majority voting considering pure expert judgement as ground truth.

Review #3

By Maribel Acosta submitted on 19/Jun/2018

Suggestion:
Accept

Review Comment:

In this version of the manuscript, the authors have addressed my concerns raised previously, except for clarifying the time when the microtasks were submitted to CrowdFlower. This information is still not reported in the paper and only available for the sound task in the raw data available at https://github.com/CrowdTruth/Cross-Task-Majority-Vote-Eval. Especially if the tasks were not crowdsourced at the same time, it would be fitting to report the timeframes when the microtasks were executed in order to provide more context about the task prices that were paid on the platform at that time.

Log in or register to post comments
4797 reads

Main menu

Editorial Board

Syndicate

Empirical Methodology for Crowdsourcing Ground Truth

Tracking #: 1887-3100

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Empirical Methodology for Crowdsourcing Ground Truth

Tracking #: 1887-3100

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles