# Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge Series

### Tracking #: 1481-2693

Authors:
Giuseppe Rizzo
Bianca Pereira
Andrea Varga
Marieke van Erp
Amparo Elizabeth Cano Basave

Responsible editor:
Guest Editors Social Semantics 2016

Submission type:
Full Paper
Abstract:
The large number of tweets generated daily is providing policy makers with means to obtain insights into recent events around the globe in near real-time. The main barrier for extracting such insights is the impossibility of manual inspection of a diverse and dynamic amount of information. This problem has attracted the attention of industry and research communities, resulting in algorithms for the automatic extraction of semantics in tweets and linking them to machine readable resources. While a tweet is shallowly comparable to any other textual content, it hides a complex and challenging structure that requires domain-specific computational approaches for mining semantics from it. The NEEL challenge series, established in 2013, has contributed to the collection of emerging trends in the field and definition of standardised benchmark corpora for entity recognition and linking in tweets, ensuring high quality labelled data that facilitates comparisons between different approaches. This article reports the findings and lessons learnt through an analysis of specific characteristics of the created corpora, limitations, lessons learnt from the different participants and pointers for furthering the field of entity recognition and linking in tweets.
Tags:
Reviewed

Decision/Status:
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Tim Baldwin submitted on 19/Dec/2016
 Suggestion: Minor Revision Review Comment: The paper is much improved from the original version, and that authors have taken onboard many of the suggestions in my original review, which was great to see. In particular, the broader discussion of the approaches taken by participants and comparison across different years really enhanced the paper, and represents a significant amount of extra work on the part of the authors. I have some issues with minor claims in the paper/areas of clarification, but don't believe the paper needs to go through a third round of reviews: + you state that a web user wants to find "exactly what she is looking for" (p4), which sounds very much like navigational search, which is one of a number of search modalities; you then go on to claim that the search query will be composed "by" (should be "of") a mention "to" (should be "of") the entity of interest; again, this is very much suggestive of navigational search, which is less than half of searches in practical settings. See Broder 2002 ("A taxonomy of web search") for details. The answer here is to tone down the controversial and unsubstantiated rhetoric, or remove this claim altogether ... or provide references to back up this claim, but my experience with search logs and the IR literature is very much at odds with this claim. + surely coverage is also a point of differentiation between KBs (you can have two KBs which are in the same domain, use the same features to describe entities, and are updated with similar frequency, but one can have much better coverage than the other. + the URL for ACL (p8) is a bit strange; a more general URL would be https://www.aclweb.org/ + your claim about long documents and KB update rates (p9) is a bit strange -- surely it's the aggregate count across all documents that affects the update rate, not the length of individual documents? + what do you mean by "unsupervised annotation" (p12)? clarify + the "unsupervised naive algorithm" seems to simply merge together string-identical NE mentions of the same type, in which case "clustering" is too grand a term (p13); this seems to be to generate the seed clusters, in which case perhaps "seed cluster generation through merging of string- and type-identical named entity mentions" would be more appropriate? + is it possible to include the Challenge Annotation Guidelines in an appendix, or at the very least, to include a URL in the paper? You mention them a number of times as being critical to the success of the tasks, meaning people should be provided access to them + you state that you use GATE to measure IAA, but how is this measured: is it at the span+label level (over full spans, a la CoNLL), or at the word+label level (over partial spans)? clarify + I couldn't understand what you meant by "The latter showed best performance, holding more complexity in the definition of the feature sets" (p16) -- please clarify + "as subsequent .. thus using the output as input" (p19) -- again, not sure what you are trying to say; please clarify + "NIL clustering is addressed as a supervised learning task" -- say more, as it's far from intuitive that you should be able to solve a clustering task with a supervised approach + in what way was the weighting in the 2013 "harmonic"? It appears to be a simple macro-average (Equations 6 and 7), which has nothing to do with harmonic averages + the argument you make about the number of mentions in a tweet and its impact on the overall evaluation seems wrong (p23) -- all of your evaluation is over all tweets rather than for individual messages, making this whole argument misleading at best, as it is not the number of mentions in a given tweet but the distribution of classes across tweets that is going to have an impact on overall results. Language/presentation issues: + "enriched tweet" => "enriched tweets" (p1) - "to grant high quality" => "resulting in high quality" (p2) + "also experienced a strong involvement of the" => "also attracted from strong interest from" (p2) + "such as the NEEL-IT" => "such as NEEL-IT" (p2) + a minor but important distinction: I would say that named entities have become a key aspect of "natural language processing" rather than "computational linguistics" (p3) + "Machine Learning, Semantic Web." => "Machine Learning, and the Semantic Web." (p3) + "is being referred by" => "is referred to by" (p3) + "in text, may not" => "in text may not" (p4) + "have been used interchangeably" => "have become interchangeable" (p3) + "explosion on ... generated to solve" => "explosion in ... proposed for" (p3) + "solve the task" => "approach to the task" (p3) + "a poor performance" => "poor performance" (p4) + "with short documents" => "over short documents" (p4) + "choice for" => "choice of" (p4) + "divided in" => "divided into" (p4) + "series of features" => "series of document-level features" (p4) + "low presence" => "relative absence" (p4) + "As more context is available the task becomes easier, little or no ..." => "More context makes the task easier, and little or no ... (p4) + "perform a misspelling" => "misspells a term" (p4) + "Following Candidate Detection" => "Candidate Detection next" (p5) + "the partial end-to-end" => "partial end-to-end" (p6) + "further years" => "later years" (p7) + "that entity" => "that an entity" (p8) + "of Named Entity Recognition" => "of the Named Entity Recognition" (p8) + "benchmark of solutions for" => "benchmark for approaches to" (p8) + "focused on Web" => "focused on the Web" (p8) + "further sections" => "later sections" (p9) + "would advantage algorithms" => "would be biased towards algorithms" (p9) + "of use" => "for using" (p9) + "that few" => "that a few" (p9) + "extra effort of the organisation" => "extra effort on the part of the organisers" (p9) + "to enabling" => "to enable" (p10) + "(FSD) algorithm [19]" => "(FSD) algorithm of [19]" (there are many FSD algorithms nowadays) + "mined from Twitter" => "downloaded from Twitter" (p11) + "semantics diversity" => "semantic diversity" (p11) + "Table 4, 5, and 6" => "Tables 4, 5, and 6" (p11) + "it consisted" => "consisted" (p12) + "Consensus, for" => "Consensus: for (p12) + "Adjudication, a" => "Adjudication: a" (p12) + "case of 2015 challenge" => "case of the 2015 challenge" (p12) + "Consistency checking, the" => "Consistency checking: the" (p13) + "cross-consistency check" is a strange term (occurs a number of times on p13; perhaps "cross-checking of consistency of ..."? + "Adjudication Phase, where the" => "Adjudication Phase: the" (p13) + "Consistency checking, the" => "Consistency checking: the" (p13) + "iterated further Phase 1" => "iterated between Phases 1 and 2" (p13) + "be a measure" => "to be a measure" (p14) + "the difficulty" => "the reading difficulty" (p15) + "defines that" => "suggests that" (p15) + "and to translating" => "and translating" (p15) + "no-ASCII" => "non-ASCII" (p16) + "a classification tasks" => "a classification task" (p16) + "resulted" => "proved" (p17) + "a Support Vector Machines" => "a Support Vector Machine" (p17) + "addressed the Mention Detection with a large set of linguistic features and lexicon related" => "addressed the Mention Detection task with a large set of linguistic and lexicon-related features" (p17) + "Shows per year submissions and ..." => "Submissions and ..." (p18) + "page rank" => "PageRank" (and add reference) + "the so-called end-to-end" => "a so-called end-to-end" (p19) + "it is proposed a tokenisation ... based on [68]" => "a tokenisation ... based on [68] was used" (p19) + you use both "ngrams" and "n-grams" -- be consistent throughout the paper + "using Random Forest" => "using a random forest" (p19) + "as means" => "as a means" (p20) + "an empirically threshold" => "an empirically-determined threshold" + in Equations 2, 3, 10 and 11, use "\neg\in" rather than "\nexists" (which looks like a funny "A" at first glance) + "Presents per year submissions" => "Submissions" (p21) + "Since the 2014 .. weighing" => "From the 2014 ... weighting" + "Equation 13, Equation 8" => "Equation 13, and Equation 8" (p22) + "none knowledge base entry" => "no knowledge base entry" (p22) + on p22, use the same font for all occurrences of "strong_typed_mention_match", "strong_link_match" and "mention_ceaf"; at present, you appear to use \textit sometimes and raw math mode at other times (making for awkward character spacing) + there is spurious indentation after Equation 14 + "and false negative (Equation 3)" => "and false negative (Equation 3) counts" (p23) + "and false negative (Equation 11)" => "and false negative (Equation 11) counts" (p23) + "Details of the algorithm is listed" => "The algorithm is detailed" (p23) + move the dangling "Where $E$ ..." paragraph to the end of the preceding paragraph (" ... from Equation 14, where ...") + "$T$ is the set of tweets" => "$Tweet$ is the set of tweets" + "allows to measure" => "supports the measurement of" (p24) + "the NEEL challenges results" => "the NEEL challenge results" (p24) + "Table 15, shows" => "Table 15 shows" (p24) + "a 76.4% performance" => "a 76.4% precision" (p24) + "Table 16, presents" => "Table 16 presents" (p24) + "with over" => "by over" (p24) + "winner team followed" => "winning team proposed" (p25) + "a delta difference" => "an absolute improvement" (p25) + "leverage from the" => "leverage the" (p25) + "and to measure, while down-breaking" => "and break down" (p25) + "the NEEL-IT" => "NEEL-IT" (p25)
Review #2
By Hao Li submitted on 03/Jan/2017