OLOUD – An Ontology for Linked Open University Data

Tracking #: 1134-2346

Authors: 
Barnabas Szasz
Rita Fleiner
Andras Micsik1

Responsible editor: 
Mark Gahegan

Submission type: 
Ontology Description
Abstract: 
The Linked Open Data pursuit has achieved remarkable progress in Europe as well, and studies have shown that it has a positive impact on the quality of education at university level too. Publishing information about university or college course, their corresponding places and related events, such as exams in Linked Data format allows the event information to be aggregated, filtered and delivered to potential participants: students and lecturers via multiple channels and devices. In this paper a new ontology is described, an Ontology for Linked Open University Data (OLOUD), which supports the development and publishing of Linked Open University Datasets and the applications built on the top of these datasets. The domain of the OLOUD ontology consists of the open data one would publish within the university. OLOUD provides a high level model covering multiple education related use cases catalyzing linked data production and consumption within the domain. OLOUD contains classes and properties to describe Organizations, People, their Roles and Publications, Subjects, Courses and other Events together with their temporal and spatial description.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Stefan Dietze submitted on 13/Aug/2015
Suggestion:
Major Revision
Review Comment:

This (ontology) paper describes an ontology for representing university related data and notions such as courses, events, persons etc. The intentions are worth supporting and the topic is timely, though the paper would benefit significantly from better motivating the design choices, the requirements gathering and better arguing for the design decisions made by the authors. As it stands, it remains unclear where the requirements came from, if stakeholders/users where actually involved at all and most importantly, why there was the need to develop yet another vocabulary for concepts which (in most cases) are already well represented in established vocabularies, ranging from general purpose ones (FOAF etc) to more domain-specific ones (AAISO, BIBO, LRMI etc). The authors present a discussion of some related works and also some mappings with existing vocabularies (more remarks on these later on), though no actual criteria for chosing vocabularies seem applied.

With regards to the requirements and data model, Sections 2 and 3 provide some insights. Section 2 states: "The following is the result of our research on potential open data sources and use cases". That is followed by some rough list of notions to represent (persons, documents etc). Is this the data model you want to represent? It reads already like a list of concepts and predicates. Where does this originate from, some stakeholders? The data model of existing datasets you want to represent with your vocabulary? Please try to be more precise to better motivate your work. Section 2 continues listing some sort of requirements: "describe all significant units of a university", "describe all subjects and courses", "describe significant areas of the university buildings and a route among these" etc. However, where do these originate from? Did some stakeholders define these reqirements? Where these stakeholders (lecturers, students)? If so, where these of your unviversity only? Also, it is not clear why these actually motivate the creation of yet another ontology (as opposed to reusing existing terms).

Section 3 describes the key concepts, which all seem natural candidates. However, it is not clear what the motivation was for these and if there had been any requirements elicitation processes upfront. Did you analyse existing university information systems to derive these concepts (and their predicates...). Did you involve stakeholders? As it stands, parts of the paper might come across as seemingly arbitrary.

On a similar note, what would make your ontology more generalisable / applicable in other contexts than any of the already existing vocabularies, where some already have been developed as some kind of public community process, i.e. involving representatives from different organisations and use cases to ensure wider applicability. See the references [1-6] below for some further resources and insights into existing vocabularies in the academic/educational area.

Generally, the paper sometimes seems to be broad and unspecific in its claims. For instance, the paper states "It turned out, that the existing ontologies in this field are incomplete, deficient and not extensible." Sure, these vocabularies (like all) were created with a specific use case in mind, not always reflecting the needs of all stakeholders or scenarios. However, all vocabularies can be derived and extended, e.g. by creating your own sub classes only where needed. Please have a look at http://linkeddatabook.com/editions/1.0/#htoc49 and specifically, Sections 4.4.4 - 4.4.6 which describe some basic principles to follow. These include common practices for supplementing existing vocabularies (rather than reinventing their terms) and to link your supplementary terms to the existing vocabularies (eg using owl:equivalenClass or rdfs:subTypeOf statements).

You certainly would want to avoid falling into the "XKCD 927" (https://xkcd.com/927/) trap. ;-)

Section 4 states: "There are no established recommendations in the literature about how to select vocabularies for reuse." I would politely disagree here. There are lots of recent works in vocabulary selection and recommendation and even at a more general level, there a established recommendations and guidelines (see http://linkeddatabook.com/editions/1.0/#htoc54 and http://www.w3.org/TR/ld-bp/#VOCABULARIES). These tutorials describe some general criteria for selecting vocabulary terms, such as "Usage and uptake", "Maintenance and governance", "Coverage", "Expressivity". All of these can be investigated from scratch for a particular vocabulary, while in several cases, useful stats are available already on the Web (see [1] and [2]) which give you an idea of the suitability of specific vocabularies.

Looking at the vocabularies themselves, it seems some links have been specified with established vocabularies (mostly AAISO and FOAF), though important ones seem missing (eg "Publication" is not linked at all, where candidates would be foaf:Document, bibo:Article etc). Also the existing ones could be questioned. For instance, or seem to be not entirely equivalent. Also, is defined as equivalent to , though while the former is tied to academic organisations/persons, the latter is a more general term and might better be defined as super class (rather than equivalent).

The Teach vocabulary is used for some properties, also implying some (not explicitely expressed) inheritance/equivalence relationships between oLoud and Teach concept. For instance, the use of the http://linkedscience.org/teach/ns/#teacher predicate, implies some relationships between oloud:Course and teach:Course. Why are these not represented/considered, while others are? Generally, the choice of mappings and considered terms and vocabularies would benefit from a better line of argumentation. I would suggest to provide a detailed overview of the mappings you provided with your ontology, i.e. the explicit mapping statements as well as the inferred mappings.

In Section 6, the authors describe a dataset which is seemingly using the vocabulary. However, no links to the dataset seem provided (though the URI scheme is described) and no information about the scale of the data (# instances / concept etc) are provided. The "evaluation" in Section 6.2 lists not an actual evaluation but only some general precautions and seems to not involve any stakeholders whatsoever. While it is generally hard to provide a sound evaluation for an ontology, a more structured approach throughout the paper could have facilitated some kind of simple validation. For instance, by defining some sound requirements together with the actual stakeholders in the data (students, lecturers), one could derive some *sample queries* and *requirements* which could be used for validation, by assessing, for instance, how well the current shape of the ontology facilitates user-defined test cases (queries), how efficient the query answering is etc. As it stands, it seems as if the ontology has been deveoped without any sound stakeholder feedback or input.

Section 7 - "Related work": please have a look at the resources [1-6] listed below which should provide valuable insights into vocabulary usage in the (educational) wild.

To summarise, my general recommendations for revising the paper would be:

- Better motivate the requirements for your ontology, eg by eliciting (with actual stakeholders) and describing requirements, use cases and queries you need to facilitate

- Improve the description of the design process and choices: How did you derive requirements? How did you ensure applicability to use cases? How did you actually research existing vocabularies eg the use of terms on the Web?

- Revise the vocabulary by better making use of existing terms - following a thorough selection strategy using the guidelines mentioned above - and improving the use of and mappings to external vocabularies.

Minor comments:

- abstract contains the fairly broad claim "The Linked Open Data pursuit has achieved remarkable progress in Europe as well, and studies have shown that it has
a positive impact on the quality of education at university level too." That sounds very nice and approving, particularly the "positive impact on education", but also very arguable. How did LD improve education in Europe? Please provide references in introduction or tone down this claim.

- English needs improving and should be checked by native speaker. Several typos throughout.

- abstract: "publishing information about university or college courseS"

- references sections: please list the authors in full (not just "et al")

Mentioned references:

[1] http://lov.okfn.org/

[2] http://lucero-project.info/lb/2012/04/so-whats-in-linked-datasets-for-ed...

[3] Dietze, S., Drachsler, H., Giordano, D., A Survey on Linked Data and the Social Web as facilitators for TEL recommender systems, in: Recommender Systems for Technology Enhanced Learning: Research Trends & Applications, Eds: Manouselis, N., Verbert, K., Drachsler, H., Santos, O.C., to be published by Springer in 2013.

[4] D’Aquin, M., Adamou, A., Dietze, S., Assessing the Educational Linked Data Landscape, ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.

[5] Dietze, S., Kaldoudi, E., Dovrolis, E., Giordano, D., Spampinato, C., Hendrix, M., Protopsaltis, A., Taibi, D., Yu, H. Q. (2013), Socio-semantic Integration of Educational Resources – the Case of the mEducator Project, in Journal of Universal Computer Science (J.UCS), Vol. 19, No. 11, pp. 1543-1569.

[6] http://data.linkededucation.org/linkedup/catalog/

Review #2
By Simon Cox submitted on 14/Sep/2015
Suggestion:
Major Revision
Review Comment:

This is a mildly interesting short paper, describing a new ontology for University data, focusing particularly on course structure, timetables and indoor location and navigation. It is part of a growing corpus of similar ontologies – some of which are cited in the paper – which I guess is understandable as an interestingly complex but familiar system to semantic researchers. The description of the ontology is OK.

However, the standout feature is the sample SPARQL query at the end of section 5 – which is a compact, rather elegant example of SPARQL-based reasoning. Note that the real subject of this example is not the University use-case, but indoor navigation. This suggests that perhaps a separate short paper on the loc: ontology might be worthwhile, rather than hiding this feature in the OLOUD paper. Given the title of the paper, it would be interesting to see a more university-oriented example, if available, such as timetabling.

Otherwise, the paper is comprehensible and reasonably well written, though I have some suggestions for improvement.

1. It is unclear why this work is referred to as ‘Linked Open Data’ since it does not appear to involve links outside the local system. There is passing reference to ‘Datasets’ in the second paragraph of section 2, but no further indication of what those might be.

2. The motivation (section 1.1) refers to previous ontologies with similar scope, and criticizes them in general terms as a reason for proceeding with a new design exercise. However, the deficient ontologies are neither identified (at this stage) nor specific examples of their failing provided. Without this it is not clear why the new work is needed. The reader has to wait until section 7 before an evaluation of comparable work is introduced. That’s too late. The basic review of previous education ontologies presented in section 7 should be moved up to section 1 or 2, before the new work, so that the paper is properly justified. .

3. The methodology section 1.2 should be moved so that it is immediately prior to the specification of the new ontology (currently section 3).

4. In the first paragraph of the methodology (currently 1.2) the authors indicate they aim at a 4-star vocabulary, then in the last paragraph of the integration section (currently section 4) this appears to be down-graded to 3-star. Am I misunderstanding something, or should these be consistent?

5. The choice of OWL2-DL is buried as the first sentence of the third paragraph in section 2 (‘Objectives’), and is not justified. This probably belongs with the method and description, but at least a reason for this choice should be given.

6. Existing ontologies are described in two places: (i) an impressive number of vocabularies used in the integration/alignment/re-use exercise are introduced in section 4; (ii) some comparable ontologies dealing with either education, navigation or temporal topology are described at greater length in section 7. TEACH and AIISO appear in both places.

Lists and comparisons are generally easier to follow when presented as an actual list or tabulation, rather than embedded in prose. Consider restructuring this material in this way.

7. Two new base URIs are introduced in 5.1. Another one is later referred to in 5.3 (loc:) but is not given the same treatment in the text. It should either be mentioned in 5.1 or properly in 5.3.

8. Figure 2 shows a lot of specialized room classes. Are these all used separately in applications?

9. As mentioned above, the example SPARQL query, to compute the navigation route between rooms, is a very nice illustration of an application that can be built quite easily on top of a dataset formalized using the ontology – this is probably the strongest thing in the paper. It uses the part of the new ontology which appears to be most novel – the indoor architecture and location model. Are there any other applications, for example, around timetabling? (If not, then the detailed discussion of Correndo’s temporal topology is too elaborate, as it seems disconnected with rest of the paper.)

10. A number of references to existing ontologies are provided as footnotes, for which documentary descriptions are available, which should be included in the bibliography (e.g. OWL-Time). The full references should be given when available.

11. Some minor errors in the English – not enough to impede understanding, but a little awkward here and there. For example, in the first sentence of the first paragraph the finite verb (is) does not agree in quantity with the subject (the objectives, plural), and there is a ‘was’ omitted before ‘born’ in the first sentence of the second paragraph. Suggest a full proof-read (even Microsoft’s grammar checker would catch these specific issues!).

Review #3
By Enrico Daga submitted on 18/Oct/2015
Suggestion:
Major Revision
Review Comment:

This is an Ontology Description paper, and should be reviewed considering the (1) Quality and relevance of the described ontology (convincing evidence must be provided); and the (2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology.

OLOUD is an ontology for university linked open data, developed to support data publishing and applications. Authors present the ontology as an high level conceptualisation of the multiple use cases around the domain of universities, covering aspects such organisation, people and roles, publications, courses and events, with a particular accent on the spatial and temporal descriptions.

The article is clear and readable, interesting in some aspects as experience of linked data publication within a university, also highlighting a nice use case on classes location and indoor navigation.
However, in the present form, it does not provide a convincing evidence about the quality and relevance of the ontology, in my opinion.

The main motivation for building such ontology is the fact that existing resources are described as incomplete or not extensible. While the authors give some arguments about this, for example that they lack an appropriate temporal or spatial descriptions of events, personally I don’t think that this is convincing enough for the need of a new general ontology for university data. Particularly, the article put the accent on the motivation to wrap existing models into a single model, that I don’t find particularly useful in the linked data, while I find much more useful to use dedicated existing vocabularies for the different portions of the data. I am not convinced that users should like to only have one vocabulary, particularly when they want to integrate data from different linked data sources (this is kind of the purpose of the thing, isn’t it?).

In Section 2. the authors list a number of use cases coming from personal data to organisation units, documents etc... that could be relevant for the publication of open data in a university. But this list is too generic (each bullet point ends with “etc…”) and not connected with the description of the ontology in any way. The authors say they want to design and maintain the course schedule “with the least effort possible”, but this observation is not justified in the rest of the paper. Why the OLOUD ontology should provide this capability (and others don’t)?

Finally, personally I don’t like the motivation that “we also keep the local control over our ontology” (end of Section 4.) if this means reinventing the wheel. I think universities should reuse existing vocabularies for publishing their data, integrating many of them if they need to, and create new vocabularies or ontologies when this is really needed.

Existing ontologies cover a number of prominent use cases in the domain - AIISO, mentioned in the related work section, or BIBO and LRMI, that are not. For example universities as organisations are much better described by AIISO, and there is no convincing argument in the paper about why OLOUD should be better, or why it needs to extend it (actually, it doesn’t). The BIBO ontology is much more exhaustive that the proposed ontology, obviously. I am not convinced that the class oloud:Publication is needed in any possible way.

Another argument is that the OLOUD ontology is “completely generalised” and easily extendible, but let me disagree with one example only.
A “Specialisation” "can be acquired as part of the qualification, providing special expertise in training, which is indicated on the proof of successful completion of the university diploma.”. This might apply to some kind of certifications or post graduate courses, but this changes very much between different jurisdictions, and also different universities in the same country. Why OLOUD contains “Specialisation” and does not make any distinction between Postgraduate or Undergraduate degrees or courses, for example?

Similarly, no convincing argument is given about the fact that OLOUD needs to redefine Course, Event or even Person. For what purpose? What gives you loud:Course that aiiso:course can’t?
The authors claim that the reuse strategy was to create new classes in OLOUD and align them, and to reuse existing properties straight away. For example, the ontology reuses foaf:member but redefines oloud:Person. This method is not justified in any way.

Why the ontology needs a top class OLOUDThing?

While I share with the authors the opinion that there is a need for shared ontologies in the domain, I don’t see evidence that OLOUD could be a strong candidate for this. I am sure this ontology takes into account interesting use cases - events descriptions and directions for classes is the prominent one - and can be a resource for discussion in the area.

The related work section contains a long paragraph around the representation of time entities, but the reader has no clue about how this relates to OLOUD (does the authors reused this related work? How?).

The Evaluation section (6.2.) describes how problems in the ontology population phase (from tabular data to triples) have been solved by loading the data in an ontology editor and running a set of SPARQL queries. This is good enough for testing the compliance of the schema with the data, but it is not an evaluation of the Ontology - why it is good, why we need it.