Converting linguistic datasets into interoperable Linked Data resources - The case of WALS, IDS and WOLD

Martin Brümmer

Guest editors Multilingual LOD 2012 JS

Dataset Description
In this paper I describe the conversion of three linguistic datasets, namely WALS, IDS and WOLD, into RDF. I focus my discussion on the challenges encountered in transforming detailed linguistic datasets, and in particular their disparate internal structures and semantic contents, into interoperable Linked Data. I then test the syntactic and semantic interoperability achieved by linking these resources together in the LLOD and I highlight the general problems involved in making broad cross-linguistic data sources interoperable.
Full PDF Version: 


Solicited Reviews:
Review #1
Anonymous submitted on 16/Jul/2013
Review #2
By Sebastian Nordhoff submitted on 04/Oct/2013
Minor Revision
This paper provides a good and understandable overview of three valuable
datasets of lexical and typological data in a wide variety of languages.
It explains the structure of the datasets and addresses questions of
syntactic and semantic interoperability. These questions were evaluated
with SPARQL queries given in an appendix.
The paper is well-written and easy to understand (disclaimer: I already
knew the datasets discussed). It will be important for further development
of the typological part of the LLOD cloud. I recommend publication.

I have the following suggestions for further improvement:

in the abstract, the acronym LLOD should be expanded

in general, it might be better to use LLODC[loud] than LLOD, because
normally the author speaks about the cloud and not about Linked Data.
Typos and grammar:

p1: such as [delete comma]
p2: tuples of the type (language, feature, feature value)
p2: language families and genera (or language family and genus)
p2: no need to capitalize DCTERMS
p2: geopositioning COMMA footnote
p2: three letter WALS code [delete i.e.]
p2: it is unclear what "correcting grammar" means. WALS features do
certainly not provide sufficient information for NLP enhanced text
correction, if this was intended
p3: what is meant by "data entry"
p3: procedure is [NOCOMMA] that
p3: instead of skos: broader, one could use glottolog:superlanguoid, which
is more precise (and a subclass of skos:broader)
p3: 4b dcterms also has a language property. I am not sure how this
relates to GOLD:inlanguage, but both could probably be used
p6: remove either 'although' or 'but'\
p7: in the discusssion of ISO 639-3, Glottolog should be mentioned, which
provides indentifiers for a wider range of varieties
p8: what does "intellectually compiled" mean? I furthermore do not
understand what "THis would be due" should mean
p8: "This objective was done" rephrase
p10: reference Saulwick broken (no author names, encoding problem)

Review #3
Anonymous submitted on 17/Feb/2015
Minor Revision
