Review Comment:
This (ontology) paper describes an ontology for representing university related data and notions such as courses, events, persons etc. The intentions are worth supporting and the topic is timely, though the paper would benefit significantly from better motivating the design choices, the requirements gathering and better arguing for the design decisions made by the authors. As it stands, it remains unclear where the requirements came from, if stakeholders/users where actually involved at all and most importantly, why there was the need to develop yet another vocabulary for concepts which (in most cases) are already well represented in established vocabularies, ranging from general purpose ones (FOAF etc) to more domain-specific ones (AAISO, BIBO, LRMI etc). The authors present a discussion of some related works and also some mappings with existing vocabularies (more remarks on these later on), though no actual criteria for chosing vocabularies seem applied.
With regards to the requirements and data model, Sections 2 and 3 provide some insights. Section 2 states: "The following is the result of our research on potential open data sources and use cases". That is followed by some rough list of notions to represent (persons, documents etc). Is this the data model you want to represent? It reads already like a list of concepts and predicates. Where does this originate from, some stakeholders? The data model of existing datasets you want to represent with your vocabulary? Please try to be more precise to better motivate your work. Section 2 continues listing some sort of requirements: "describe all significant units of a university", "describe all subjects and courses", "describe significant areas of the university buildings and a route among these" etc. However, where do these originate from? Did some stakeholders define these reqirements? Where these stakeholders (lecturers, students)? If so, where these of your unviversity only? Also, it is not clear why these actually motivate the creation of yet another ontology (as opposed to reusing existing terms).
Section 3 describes the key concepts, which all seem natural candidates. However, it is not clear what the motivation was for these and if there had been any requirements elicitation processes upfront. Did you analyse existing university information systems to derive these concepts (and their predicates...). Did you involve stakeholders? As it stands, parts of the paper might come across as seemingly arbitrary.
On a similar note, what would make your ontology more generalisable / applicable in other contexts than any of the already existing vocabularies, where some already have been developed as some kind of public community process, i.e. involving representatives from different organisations and use cases to ensure wider applicability. See the references [1-6] below for some further resources and insights into existing vocabularies in the academic/educational area.
Generally, the paper sometimes seems to be broad and unspecific in its claims. For instance, the paper states "It turned out, that the existing ontologies in this field are incomplete, deficient and not extensible." Sure, these vocabularies (like all) were created with a specific use case in mind, not always reflecting the needs of all stakeholders or scenarios. However, all vocabularies can be derived and extended, e.g. by creating your own sub classes only where needed. Please have a look at http://linkeddatabook.com/editions/1.0/#htoc49 and specifically, Sections 4.4.4 - 4.4.6 which describe some basic principles to follow. These include common practices for supplementing existing vocabularies (rather than reinventing their terms) and to link your supplementary terms to the existing vocabularies (eg using owl:equivalenClass or rdfs:subTypeOf statements).
You certainly would want to avoid falling into the "XKCD 927" (https://xkcd.com/927/) trap. ;-)
Section 4 states: "There are no established recommendations in the literature about how to select vocabularies for reuse." I would politely disagree here. There are lots of recent works in vocabulary selection and recommendation and even at a more general level, there a established recommendations and guidelines (see http://linkeddatabook.com/editions/1.0/#htoc54 and http://www.w3.org/TR/ld-bp/#VOCABULARIES). These tutorials describe some general criteria for selecting vocabulary terms, such as "Usage and uptake", "Maintenance and governance", "Coverage", "Expressivity". All of these can be investigated from scratch for a particular vocabulary, while in several cases, useful stats are available already on the Web (see [1] and [2]) which give you an idea of the suitability of specific vocabularies.
Looking at the vocabularies themselves, it seems some links have been specified with established vocabularies (mostly AAISO and FOAF), though important ones seem missing (eg "Publication" is not linked at all, where candidates would be foaf:Document, bibo:Article etc). Also the existing ones could be questioned. For instance, or seem to be not entirely equivalent. Also, is defined as equivalent to , though while the former is tied to academic organisations/persons, the latter is a more general term and might better be defined as super class (rather than equivalent).
The Teach vocabulary is used for some properties, also implying some (not explicitely expressed) inheritance/equivalence relationships between oLoud and Teach concept. For instance, the use of the http://linkedscience.org/teach/ns/#teacher predicate, implies some relationships between oloud:Course and teach:Course. Why are these not represented/considered, while others are? Generally, the choice of mappings and considered terms and vocabularies would benefit from a better line of argumentation. I would suggest to provide a detailed overview of the mappings you provided with your ontology, i.e. the explicit mapping statements as well as the inferred mappings.
In Section 6, the authors describe a dataset which is seemingly using the vocabulary. However, no links to the dataset seem provided (though the URI scheme is described) and no information about the scale of the data (# instances / concept etc) are provided. The "evaluation" in Section 6.2 lists not an actual evaluation but only some general precautions and seems to not involve any stakeholders whatsoever. While it is generally hard to provide a sound evaluation for an ontology, a more structured approach throughout the paper could have facilitated some kind of simple validation. For instance, by defining some sound requirements together with the actual stakeholders in the data (students, lecturers), one could derive some *sample queries* and *requirements* which could be used for validation, by assessing, for instance, how well the current shape of the ontology facilitates user-defined test cases (queries), how efficient the query answering is etc. As it stands, it seems as if the ontology has been deveoped without any sound stakeholder feedback or input.
Section 7 - "Related work": please have a look at the resources [1-6] listed below which should provide valuable insights into vocabulary usage in the (educational) wild.
To summarise, my general recommendations for revising the paper would be:
- Better motivate the requirements for your ontology, eg by eliciting (with actual stakeholders) and describing requirements, use cases and queries you need to facilitate
- Improve the description of the design process and choices: How did you derive requirements? How did you ensure applicability to use cases? How did you actually research existing vocabularies eg the use of terms on the Web?
- Revise the vocabulary by better making use of existing terms - following a thorough selection strategy using the guidelines mentioned above - and improving the use of and mappings to external vocabularies.
Minor comments:
- abstract contains the fairly broad claim "The Linked Open Data pursuit has achieved remarkable progress in Europe as well, and studies have shown that it has
a positive impact on the quality of education at university level too." That sounds very nice and approving, particularly the "positive impact on education", but also very arguable. How did LD improve education in Europe? Please provide references in introduction or tone down this claim.
- English needs improving and should be checked by native speaker. Several typos throughout.
- abstract: "publishing information about university or college courseS"
- references sections: please list the authors in full (not just "et al")
Mentioned references:
[1] http://lov.okfn.org/
[2] http://lucero-project.info/lb/2012/04/so-whats-in-linked-datasets-for-ed...
[3] Dietze, S., Drachsler, H., Giordano, D., A Survey on Linked Data and the Social Web as facilitators for TEL recommender systems, in: Recommender Systems for Technology Enhanced Learning: Research Trends & Applications, Eds: Manouselis, N., Verbert, K., Drachsler, H., Santos, O.C., to be published by Springer in 2013.
[4] D’Aquin, M., Adamou, A., Dietze, S., Assessing the Educational Linked Data Landscape, ACM Web Science 2013 (WebSci2013), Paris, France, May 2013.
[5] Dietze, S., Kaldoudi, E., Dovrolis, E., Giordano, D., Spampinato, C., Hendrix, M., Protopsaltis, A., Taibi, D., Yu, H. Q. (2013), Socio-semantic Integration of Educational Resources – the Case of the mEducator Project, in Journal of Universal Computer Science (J.UCS), Vol. 19, No. 11, pp. 1543-1569.
[6] http://data.linkededucation.org/linkedup/catalog/
|