The Open University Linked Open Data - data.open.ac.uk

Tracking #: 858-2068

Authors: 
Enrico Daga
Mathieu d’Aquin
Alessandro Adamou
Stuart Brown

Responsible editor: 
Philippe Cudre-Mauroux

Submission type: 
Dataset Description
Abstract: 
The article reports on the evolution of data.open.ac.uk, the Linked Open Data platform of the Open University, from a research experiment to a data hub for the open content of the university. Entirely based on Semantic Web technologies (RDF and the Linked Data principles), data.open.ac.uk is used to curate, publish and access data about academic degree qualifications, courses, research papers and open educational resources of the university. It exposes a SPARQL endpoint and several other services to support developers, including queries stored server-side and entity lookup using known identifers such as course codes and YouTube video IDs. The platform is now a key information service at the Open University, with several core systems and websites exploiting linked data through data.open.ac.uk. Example applications include connecting entities such as courses to media objects published in different places (YouTube, Audioboo, OpenLearn, etc.) and providing recommendations of resources based on application-specific queries. Through these applications, data.open.ac.uk is now fulfilling a key role in the overall data infrastructure of the university, and in establishing connections with other educational institutions and information providers.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Amrapali Zaveri submitted on 07/Nov/2014
Suggestion:
Minor Revision
Review Comment:

The article “The Open University Linked Open Data - data.open.ac.uk” describes a Linked Data source of data about academic degree qualifications, courses, research papers and open educational resources of the university.

The dataset is well explained and is a useful and rich source of information from a large variety of sources mainly related to education. The several use cases presented also show the usefulness of the dataset. Also, the interlinks to other datasets enrich the information and provide interesting use cases. The data is modeled using several other existing vocabularies following a systematic procedure and published keeping the Linked Data principles in mind. Thus I definitely recommend to accept the paper. However, I only have a few queries that the authors might think about adding more information in the paper about:
- How accurate and efficient is the updating procedure?
- What is the quality of links? Has there been an evaluation of the links generated? How complete are the interlinks?
- I think the actual transformation process is missing. What tool was used? How was the data actually transformed?
- How complete is the dataset?
- Why is some of the data “officially” released and other not, what is the criteria?
- I think a more concrete example in Section 5 regarding the usage would be useful.
- What is the performance and throughput of the query engine and the recommender system?
- Is there an evaluation of how accurate the recommendations actually are?

The paper is very well written. However, I encountered a few formal errors as listed below:
1.2 Background
- 4 - four
3.2 Design of entity URIs
- pattern [6]) - extra ending bracket
3.6 Blank nodes and other modelling issues
- rdf - RDF
4. Services
- store.For - store. For
- practise - practice
5. Usage
- I think you mean Figure 1 - there is no Figure 5
6. Maintenance
- dataare - data are
Throughout
- Instead of forward references to sections, I would prefer only back references
- Either use Linked Data or Linked Open Data - make it consistent

Review #2
By Alex Olivieri submitted on 10/Nov/2014
Suggestion:
Minor Revision
Review Comment:

The manuscript describes the evolution of a platform for Linked Open Data. It is a technical report that explains how this system is evolving, and the added value it brings for the sharing of information, mainly in the context of the University where the system has been created.

I liked the idea of the "queries stored server-side" component, but I would have appreciated more to know more details about it.

The paper in well written except for the section 6 (Maintenance), that contains some typos errors and it is not easily readable.

Nonetheless, the paper has two important lacunae: the introduction does not indicate how the paper is organized; the paper does not contain a conclusion section, where you highlight the real value of your work.

The mandatory required revisions are:
- Rewrite section 6 (Maintenance)
- Add a conclusion section
- Enrich the introduction, by adding the description of how the paper is organized

Suggested revision:
- Explain more about the "queries stored server-side" component.

Review #3
By Michael Luggen submitted on 17/Nov/2014
Suggestion:
Minor Revision
Review Comment:

The paper reads itself like a complete and clearly written manual for data.open.ac.uk (at least up to section 4). This is generally a good thing, as it is a Data Description entry for the SWJ. In the section 3 (Modeling Issues) you discuss design decisions taken for Graphs, URIs and domain modeling. The decisions are mostly grounded on the commonly accepted principles in the LD field. This is good! However I feel you should tell us much more about the technical limits and organizational debates which influenced this decisions. Meaning if you are able to share some details in this regard it might be highly valuable knowledge for other LD creation projects.

You can look back on 4 years of experiences with LD creation. This is a huge success. I especially like the classification in the second last section (rebuilt, update, sync). Again please, instead of repeating known LD principles tell us a bit about tools or at least transformation approaches which worked out for you. This experience can be valuable for other projects.

I am aware that I ask for a bit more than defined for an Data Description entry. Consider weaving in some of this unique experience in the paper. It is an excellent paper which covers the huge LD pool achieved within data.open.ac.uk.

Please correct the minor issues below:

Content
- Section 3.6: I too hate blank nodes (-, but you might base your claim on a reference: “handling blank nodes with other query types still falls short of efficiency.”

Style
- Please use a coherent style for SPARQL queries (upper/lower case) throughout the paper.
- Section 3.4 Listing “Related to”: I guess this means to show a hierarchical structure? Better no elements than as it is currently.
- Table 1, Domain: Tim excused himself once for unnecessary adding two // in the URI scheme. But you should still keep them for now. (-,
- Section 2.2: DiscOU was not introduced up to here?
- Section 4: Missing space before “For example”

Orthographic
- Section 2.1.4: data are collected FROM internal
- Section 6: The DATAARE basically ?