Extending a CRF-based Named Entity Recognition Model for Turkish Well Formed Text and User Generated Content

Tracking #: 1440-2652

Gülşen Eryiğit
Gökhan Şeker

Responsible editor: 
Guest Editors Social Semantics 2016

Submission type: 
Full Paper
Named entity recognition (NER), which provides useful information for many high level NLP applications and semantic web technologies, is a well-studied topic for most of the languages and especially for English. However the studies for Turkish, which is a morphologically richer and lesser-studied language, have fallen behind these for a long while. In recent years, Turkish NER intrigued researchers due to its scarce data resources and the unavailability of high-performing systems. Especially, the need to discover named entities occurring in Web datasets initiated many studies in this field. This article presents the enhancements made to a Turkish named entity recognition model [5] (based on conditional random fields (CRFs) and originally tailored for well formed texts) in order to extend its covered named entity types, and also to process extra challenging user generated content coming with Web 2.0. The article introduces the re-annotation of the available datasets to extend the covered named entity types, and a brand new dataset from Web 2.0. The introduced approach reveals an exact match F1 score of 92% on a dataset collected from Turkish news articles and ∼65% on different datasets collected from Web 2.0.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Genevieve Gorrell submitted on 25/Aug/2016
Minor Revision
Review Comment:

The article is much improved. The contribution of the feature engineering work the authors present is clear, and the review of Turkish NER is helpful and thorough. Resources are shared. I have only minor comments mostly related to language, though note that this is not a thorough proofread:

-The old title remains at the tops of the pages
-Page 2, "Chineese"
-Page 2, "experimented for various" -> "experimented with for various"
-Page 2, "(MRLs) poses" -> "pose"
-Page 2, "as it is the case" -> "as is the case"
-Page 2, "is treated to have influence" -> are considered? are suggested?
-Page 3, "makes a preliminary investigation on" -> "of"
-Page 3, "which sometimes allowed to" -> "allowed them to" or "made it possible to"
-Page 3, "performances listed in Table 1 is" -> "are"
-Page 4, "of a NE is" -> "of an NE are"
-Page 4, sentence beginning "The study reports" is double negated
-Page 4, "except the use" -> "except for the use"
-Page 5, "do not namely mention" -> explicitly mention?
-Page 5, "child stories" -> children's stories?
-Page 5, "reports a CoNLL score" -> report
-Page 5, "systems which mostly needs" -> need
-Page 5, "annotated by ENAMEX" -> with
-Page 5, "dedicated for" -> to
-Page 5, "resulted with the emergence" -> resulted in
-Page 5, "introduces a Twitter dataset" -> introduce
-Page 6, "As known", would have to be "as we know", but better just to leave it out?
-Page 6, "makes hard" -> "makes it hard"
-Page 6, "eliminates the dependency of the recent works towards the Twitter content only" -> "eliminates the limitation to Twitter content found in recent work"?
-Page 6, section heading 5 not capitalized.
-Page 7, "this is an automatic processing" -> "automatic process". The comma is also not required in that sentence.
-Page 10, "treated to be" -> considered?
-Page 10, "Eight different features used" -> are used
-Page 10, "yields at very high" -> a
-Page 10, "indicating that the current token" -> whether
-Page 12, "as it was the case" -> "as was the case"
-Page 12, "The reason for this may be explained .."--I don't understand this sentence
-Page 12, "This section give" -> gives
-Page 12, "the remaining of this section" -> remainder
-Page 13, "percantage"
-Page 13, "used for the used CRF features" -> chosen features? selected?
-Page 14, "try to adapt a similar" -> tries
-Page 14, "as the omission lexical features"--section in parentheses is not grammatical
-Page 14, "we consider that the claims deducted from here"--needs rewording

Review #2
By Giuseppe Rizzo submitted on 13/Sep/2016
Major Revision
Review Comment:

Thanks the authors for updating the paper and answering to the reviewers' questions and comments.

A few other remarks, in line with my previous:
1) there is a somehow vagueness in reporting your experiments exhaustively:
- previous NER studies on MRLs -> it's on Turkish, why not stating this? So far, you have worked for this
- from Web 2.0, what do you mean? Isn't it Twitter?
- ∼65% -> please report the exact score
2) Rename Sec 2, since "Turkish" isn't much descriptive. What about "Morphological Characteristics of the Turkish language" for instance ? or something that is more specific
3) "The authors interacted with the owners of the previous systems which sometimes allowed to obtain a performance score by sending their test data to be tested with the
prior system or obtaining the test data set used in the prior work" -> what do you mean as sometimes? Can you please be more exhaustive here?
4) please be precise in reporting the metrics. What I can suggest is to have beside MUC or CoNLL also the metric, such as p,r,F1 and micro-/macro reported. For instance, Table 6 I reckon that is F1, using the CoNLL 2000 evaluation strategy, correct?
5) footnote 5: please further comment it, and provide examples. If not, this can be perceived as a mere speculation and I'm sure it's not the case
6) there is an ambiguity between the definition of the language resource(s) and the dataset(s). This is evident in Sec 4.2 and Table 3 for instance. Please remove this ambiguity and harmonize the definitions
7) Table 6, not sure what the "-" stands for. Is it to indicate that from the base model you've added that feature in the model? In the remainder of Section 6 this is indicated as "+". This needs to be harmonized

Finally, but most importantly since this question didn't get any answer: tuning CRF is indeed crucial for reaching good performance in particular domains, such as when annotating tweets. But what does your approach do to be innovative wrt conventional CRF-based ones? Beyond the feature engineering, the data annotation, and a gazetteer, what does your approach offer to the community in terms of novelty? So that we can let emerging from the paper the innovative spirit of the approach rather than listing a simple gaining in performance (which is anyway so tiny that is in the range of SOTA plus domain specific tuning and thus not much innovative)

New text has been added, new typos can be spotted (a few, not all, listed here and please proof-read multiple times to ease the understanding):
- Semantic Web technologies focuses on -> ... focus on
- doesn’t -> no contractions
- Twitter example -> tweet
- differentiate/identify -> differentiate or identify?
- an CoNLL score -> .. a ... and generally be more precise in which metric is used, p/r/f1
- useful feature conjunctions -> conjunction features (?)
- didn’t -> did not (generally, avoid negative sentences)

Review #3
Anonymous submitted on 20/Sep/2016
Review Comment:

In this revised manuscript, I am glad to see that two key weaknesses, namely, framing the contribution of the paper and details of the experimental evaluation (and comparisons to prior art), mentioned in my previous review are adequately handled.

Some minor issues:

- In section 6.2, now there are two types of comparisons to previous work: i) comparisons to [49] and [2] obtained by re-implementing the latter works, and ii) comparison to others, based on the “reported results” as shown in Table 11. However, these are presented in a rather mixed way. I recommend the authors to make this distinction more explicit by a clear discussion in text (e.g., Revising the sentence “Table 11 presents the comparison with the related works on UGC domain.” as ““Table 11 presents the comparison with the *other* related works on UGC domain *based on the reported results*.”) It might also be useful to reorganize the discussions/paragraphs in pages 13 and 14 accordingly.

- In Table 11, whenever possible, please state *both* the training set and test set (as in Tables 8 – 10) to facilitate the comparisons.

- Whenever appropriate, I recommend the authors avoiding to use expressions like “[XX] introduce this” but rather prefer to say “Doe et al. introduce this” (especially in Section 3) for the sake of readability.

- Regarding the CRF template file, my humble opinion is to squeeze it into one or two pages (e.g., using a table-like multi-column format) and providing together with the paper (i.e., as an Appendix), to facilitate the reproducibility of the findings by future readers of the paper.