IDSWrapper: a Linked Data interface to the Institute for Development Studies’ data

Paper Title: 
IDSWrapper: a Linked Data interface to the Institute for Development Studies’ data
Authors: 
Christophe Guéret, Victor de Boer, Duncan Edwards, Timothy G. Davies
Abstract: 
This short paper provides a description of the IDS Wrapper used to expose the data from the Institute for Development Studies’ Knowledge Services as Linked Open Data. The IDS Wrapper provides Linked Data access to 35,000 research documents on development research as well as its medata. The IDS Wrapper links this metadata to a number of external sources: DBpedia, GeoNames, Lexvo and the IATI Linked Data set. We expect that the IDS data will play a central role in the larger web of Linked Data for global development.
Full PDF Version: 
Submission type: 
Dataset Description
Responsible editor: 
Pascal Hitzler
Decision/Status: 
Major Revision
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Solicited review by Axel Polleres:

This paper presents a preliminary description of a dataset wrapping information from the Institute for Development studies.

The dateset is potentially interesting, but the authors make no clear attempt to showcase possible uses.

The quality of the dataset is not yet clear, particularly, there is no comment on the sustainability of the project, beyond the exercise of creation of a wrapper, even the URI (http:///idswrapper.appspot.com) is marked as preliminary,

As for usefulness (or potential usefulness) of the dataset, it would be nice if the authors gave some concrete examples, e.g. of possible applications and queries that they envision with this dataset.

As for clarity and completeness, the examples and figures of Section 3.3 all seem to be cut off in the PDF. There is not query-, browse-, or search functionality as of yet, it seems and when I go on the Webpage, I have no guidance how to navigate this dataset and find out how it could be useful for my purposes.

In summary, this work is a potentially useful project in a preliminary stage which is welcome, but probably too immature for the purposes of this special issue.

I recommemnd the authors to deploy it further and find adopters, seek feedback in workshops, etc. first and target a journal again when the dataset has proven usefulness in some sense.

Solicited review by Philippe Cudre-Mauroux:

This short paper describes a wrapper used to convert and interlink metadata from the Institute of Development Studies (IDS) into Linked Data. Overall, I found the paper interesting and well-written. More specifically, the architecture of the wrapper is compelling: it dynamically converts identifiers to linked data, calling the IDS REST API and creating links to further Linked Data resources on-the-fly taking advantage of the Java restlet package deployed on Google's AppEngine. Also, the authors give a nice overview of the state-of-the-art in international development APIs and data dumps. A few interesting points are mentioned in the paper but should be in my opinion discussed in more detail, namely: i) to what extend would it be possible to mine information directly from the text of the 53'000 research documents? ii) since the URI scheme is human-friendly (it includes the literals corresponding to the "collections"), why not adding the label of the resource itself as well? (having the label in the URI is imho useful in many situations) iii) it would be really interesting to have some information on the efficiency and effectiveness (e.g., precision and if possible recall) of the various linkage services and finally iv) why only develop propriety client applications based on this data? Wouldn't a SPARQL endpoint / RDF dump be possible? Please explain in the context of your project.

Solicited review by Norman Heino:

This paper describes a dataset about research results from development studies as made available by IDS (Institute for Development Studies).
The software component used to convert IDS data to RDF (called 'IDSWrapper') is also described in brief.
Like the original IDS data the converted set is about documents, organizations, categories, countries and regions of research focus.
The data is on the fly enriched with links from Lexvo, DBpedia, IATI, and Geonames which helps in understanding the data once found.

I found the data to be of medium to high quality.
Some values are obviously wrong or missing, but most of the data seems plausible.
What I particularly like is that some properties have been replaced by or linked to more common ones.
Replacement has been done for rdf:type and dcterms:language, while others like ids:date_created or ids:cat_parent have been linked via rdfs:subPropertyOf relations to Dublin Core, FOAF, or SKOS vocabularies.

Usefulness of the dataset is a bit hard to evaluate since it consists mainly of metadata about research articles.
The value obviously lies in the articles themselves and the dataset's raison d'être is making it easier to find those articles.
As the authors note their implementation lacks a search feature but the technical reason given is a bit unconvincing.
Why are search queries not just forwarded to the original search API?
The paper quickly mentions a client application that could be used to 'browse through the IDS documents'.
I would like the authors to elaborate a bit more on the potential that is gained through RDF here.
For instance since the categorization properties are derived from SKOS this would enable generic SKOS browsers to be able to browse articles by category hierarchy: something which is not possible through the IDS interface.

The paper is clearly written and provides a usage example as well as the bigger picture on supporting development practitioners.
Other than a few typographical errors and minor issues no editorial revision is needed.

Typos
-----
* Abstract: "as well as its medata" => "as well as its metadata"
* page 2, section 3, par. 1: "different type of entities" => "different types of entitites"
* page 2, section 3, par. 3: "and based on those to establish links" => "and based on those establish links"
* page 3, section 3.3.3, par. 1: "example of such link" => "example of such a link"
* page 4, par. 1: "by using applying their URI scheme": remove one of using, applying

Other issues
------------
* Why are Fig. 1--5 presented as screenshots? To improve presentational quality I suggest reaplacing them qith real tables.
* For standard namespaces you should use the standard prefixes as per recommendation (you do so for rdf but not for owl)
* The vocabulary still uses the "http://example.org#" namespace; an obvious oversight

Tags: