Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Extending a CRF-based Named Entity Recognition Model for Turkish Well Formed Text and User Generated Content

Submitted by Gülşen Eryiğit on 10/17/2016 - 05:35

Tracking #: 1474-2686

Authors:

Gökhan Şeker

Gülşen Eryiğit

Responsible editor:

Guest Editors Social Semantics 2016

Submission type:

Full Paper

Abstract:

Named entity recognition (NER), which provides useful information for many high level NLP applications and semantic web technologies, is a well-studied topic for most of the languages and especially for English. However, the modelling of morphologically rich languages (MRLs) for the NER task is still an open research area. The studies for Turkish which is a strong representative of MRLs have fallen behind the well-studied languages for a long while. In recent years, Turkish NER intrigued researchers due to its scarce data resources and the unavailability of high-performing systems. Especially, the need to semantically enrich the textual data coming with user generated content initiated many studies in this field. This article presents a CRF-based NER system which successfully models the morphologically very rich nature of this language, its extensions to expand the covered named entity types, and also to process extra challenging user generated content coming with Web 2.0. The article introduces the re-annotation of the available datasets and a brand new dataset from Web 2.0. The introduced approach reveals an exact match F1 score of 92% on a dataset collected from Turkish news articles and ~65% on different datasets collected from Web 2.0. The proposed model is believed to be easily applied to similar MRLs with relevant resources.

Full PDF Version:

swj1474.pdf

Previous Version:

Extending a CRF-based Named Entity Recognition Model for Turkish Well Formed Text and User Generated Content

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Log in or register to post comments
9971 reads

Main menu

Editorial Board

Syndicate

Extending a CRF-based Named Entity Recognition Model for Turkish Well Formed Text and User Generated Content

Tracking #: 1474-2686

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Extending a CRF-based Named Entity Recognition Model for Turkish Well Formed Text and User Generated Content

Tracking #: 1474-2686

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles