Glottocodes: Identifiers Linking Families, Languages and Dialects to Comprehensive Reference Information

Tracking #: 2843-4057

This paper is currently under review
Harald Hammarstrom
Robert Forkel

Responsible editor: 
Guest Editors Advancements in Linguistics Linked Data 2021

Submission type: 
Tool/System Report
Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog ( In this paper, we summarize the motivation and history behind the system of glottocodes and describe the principles and practices of data curation, technical infrastructure and update/versiontracking systematics. Since our understanding of the target domain — the dialects, languages and language families of the entire world — is continually evolving, changes and updates are relatively common. The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications.
Full PDF Version: 
Under Review