An extended study of content and crowdsourcing-related performance factors in Named Entity Annotation

Tracking #: 1535-2747

Oluwaseyi Feyisetan
Elena Simperl
Markus Luczak-Roesch
Ramine Tinati
Nigel Shadbolt

Responsible editor: 
Guest Editors Human Computation and Crowdsourcing

Submission type: 
Full Paper
Hybrid annotation techniques have emerged as a promising approach to carry out named entity recognition on noisy microposts. In this paper, we identify a set of content and crowdsourcing-related features (number and type of entities in a post, average length and sentiment of tweets, composition of skipped tweets, average time spent to complete the tasks, and interaction with the user interface) and analyse their impact on correct and incorrect human annotations. We then carried out further studies on the impact of extended annotation instructions and disambiguation guidelines on the factors listed above. This was all done using CrowdFlower and a simple, custom built gamified NER tool on three datasets from related literature and a fourth newly annotated corpus. Our findings show that crowd workers correctly annotate shorter tweets with fewer entities, while they skip (or wrongly annotate) longer tweets with more entities. Workers are also adept at recognising people and locations, while they have difficulties in identifying organisations and miscellaneous entities which they skip (or wrongly annotate). Finally, detailed guidelines do not necessarily lead to improved annotation quality. We expect these findings to lead to the design of more advanced NER pipelines, informing the way in which tweets are chosen to be outsourced to automatic tools, crowdsourced workers and nichesourced experts. Experimental results are published as JSON-LD for further use.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 21/Feb/2017
Review Comment:

The paper has improved substantially from the initial submission. While the previous revision was good in quality, it presented the problem that it was a nearly verbatim copy of the ESWC paper published by the authors. I appreciate the effort that the authors have made on this occasion to rewrite and flesh out parts of the paper to differentiate it from the published ESWC paper, as well as to address issues pointed out by reviewers.

The work is now clearly presented and well written, and the methodology and analysis of results are sound.

Review #2
Anonymous submitted on 09/Mar/2017
Review Comment:

The paper is a significant improvement on the prior version and for me, ready to be accepted. It begins to raise some interesting questions, but rather than address them all here, it seems best to go ahead with this paper now and leave the questions it raises to future research. A noteworthy and rounded contribution.

Review #3
Anonymous submitted on 15/Mar/2017
Review Comment:

The paper has been revised and extended since its original submission. It addresses well all reviewer suggestions from the previous version. It is now ready to be accepted.

For the camera ready version, I recommend just extending the related datasets section to include a recent large crowd-sourced NER dataset:

Leon Derczynski, Kalina Bontcheva, Ian Roberts. 2016. Broad Twitter Corpus: A Diverse Named Entity Recognition Resource. 26th International Conference on Computational Linguistics (COLING)