ROH: Towards a highly usable and flexible knowledge model for the academic and research domains

Tracking #: 2990-4204

Authors: 
Mikel Emaldi Manrique
Maite Puerta
David Buján
Diego López-de-Ipiña
Emilio Rubiera Azcona
Jose Emilio Labra-Gayo
Esteban Sota
Ricardo Alonso Maturana

Responsible editor: 
Karl Hammar

Submission type: 
Ontology Description
Abstract: 
This paper presents the work developed by the Hercules-ASIO project, putting special emphasis on the design and development of the ROH network of ontologies. ROH (Red de Ontologías Hércules, by its Spanish naming) aims to model thoroughly the main entities and relationships of the academic and research domain, e.g., projects, researchers, academic articles, universities, courses, organizations or research results. In this paper, the methodology followed for the development of ROH is detailed, paying special attention to the implementation and validation phases. Consequently, the most relevant entities are described, as well as their relationships, followed by a wide range of methods applied to continuously evaluate and enhance the ontology’s correctness and exhaustiveness.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Catherine Faron Zucker submitted on 04/Apr/2022
Suggestion:
Major Revision
Review Comment:

This paper presents the ROH modular ontology dedicated to model the academic and research domains (universities, organizations, projects, researchers, papers, etc.), especially in Spain. It is designed to be reusable and extendable to suit any other country specificities. The paper is well written and the key aspects of the ontology are clearly described.

The paper reports an important ontology engineering work, following a precisely described methodology, and resulting in an ontology accessible on GitHub, validated with a base of SPARQL queries implementing competency questions and that can be continuously refined following a CI/CD software development practice.
A description of an existing application or use-case experiments are missing. Relatedly, an evaluation of the ontology is missing: a discussion on the competency questions and their implementation, a user evaluation showing the easy adoption of the ontology and an application relying on it.

ROH is publicly available on GitHub. Additionally, I would expect to find it published on the Web, referenced in LOV, browsable using a dereferencing mechanism and/or queryable through a public SPARQL endpoint.

Regarding the design principles and methodology used to develop ROH and described in the paper, their description shows good practices when developing. However a positioning with respect to state of the art ontology development methodologies is missing. Also it may be interesting to develop and discuss the presentation of CICD workflow implemented in GitHub.

On OWL and SHACL:
- “ontological restrictions which validate the correct instantiation of classes and properties”: OWL class definitions using property restriction are not meant to perform any validation, but inferences.
- I would consider SHACL constraints as domain knowledge, part of the model, and not as a mean to validate the ontology.

On the mapping of FECYT’s CVN to ROH: again, I do not see it as a mean to validate ROH but rather as part of knowledge extraction process to build the ontology.

On the vertical module “knowledge area”: I am wondering what are the relationships between the scientific domains, the subject areas, the UNESCO codes and the FECYT referential. I guess some alignment work could be done.

Other local remarks and typos:
page 3, end of section 2, the positioning could be more precise
page 3 class.So
page 5 first sentence useless ad reference to Section 4 wrong since we still are in Section 4.
page 6 Table 1, first column, choose between a noun or a verb for each entry; line 7 last column reformulate without “return” like in the other entries
page 8-10 The choice of having a categorization in addition to a hierarchy is not obvious and should be explained
page 12 I suppose there is a confusion between “subclass of skos:Concept from the custom ontology” and SKOS concepts (instances of class skos:Concept) from the custom thesaurus.
page 13 and the following: I suggest to avoid the terms “entity” and “term” and rather use precise terms concept, class or property because entity or term can have a special meaning in a thesaurus.
page 15: represent -> s
page 17 has been -> have
page 19 focus in -> on
page 20: competency query -> questions
page 23: Listing 4 is useless
page 25: “This task …” These 2 paragraphs should be developed and better explained.

Review #2
By Laura Pandolfo submitted on 04/May/2022
Suggestion:
Minor Revision
Review Comment:

The goal of this article is presenting the development of the Hercules Network of Ontologies (ROH). ROH is a set of ontologies that models the research and academic domain (e.g. projects, researchers, academic articles, universities, courses, organizations, research results and so on).

The abstract is well-written, since it clearly and concisely presents the purpose of the article. The main goal of the paper is to describe the methodology followed for the development of ROH as well as all the steps for its implementation and continuous validation.

The introduction very clearly presents the reasons why the authors have decided to develop ROH. It is the result of a larger project (Hercules - https://www.um.es/web/hercules/ontologias) which aims to create a new information management system for Spanish universities based on Semantic Web and Knowledge Graphs technologies. In this context, ROH represents the ontology infrastructure with the aim of describing with fidelity and fine granularity the research domain.

(1) Quality and relevance of the described ontology: I believe ROH is relevant and suitable for purpose, although I have some concerns about quality that I am going to discuss below.
The implementation of the ROH network of ontologies was carried out following an iterative and incremental methodology. Even though any ontology engineering methodology is referenced, the implementation process is described systematically as well as the design principles taken into account. The modular approach applied during the design and development of ROH allows third users to reuse and extend the ontology in an easy way (e.g. adjusting it to different contexts). The three mechanisms (competency questions, the modeling of the CV, and the SHACL validation) used to validate the ontology are adequate and well described.

(2) Clarity of the paper: Good overall.
I think the paper is well written and all ideas are adequately presented with the support of explicit and comprehensible figures and tables. The developed ontology is properly described and the content of the paper is clear and readable.
The GitHub repository includes (A) a well organized README file that contains useful information to understand and assess the data (B) the ontology modules files in .ttl format, but any .owl file is available. It includes also other resources that appear to be complete for replication of experiments (e.g., validation data and validation questions). (C) GitHub is appropriate as a repository. (D) No other data artifacts are provided.

Review comments:
1. It was not clear to me from the paper, but looking at the ontology in Protégé I was quite surprised to see that entities reused in ROH have not kept their original IRI. For example, the reused class foaf:Agent has this IRI: http://w3id.org/roh/mirror/foaf#Agent rather than http://xmlns.com/foaf/spec/#term_Agent. In this regard, I think that the reuse of entities is not described in detail in Section 4.2. There are no insights about the external import used, e.g. whether the process is manually or automatically performed.
2. Information about the ontology language and the expressivity of the ontology should be introduced in the paper.
3. I expect that the .owl file will be added in the repository.
The link https://herculescrue.github.io/ROH/0%20-%20OntologyTutorial.pdf is broken.
4. I think the authors should extend the usage scenarios outside the Hercules project. It would be interesting presenting further potential uses of the ROH by giving concrete examples.

Review #3
By Raghava Mutharaju submitted on 09/May/2022
Suggestion:
Major Revision
Review Comment:

This manuscript discusses the ontology engineering aspects of an ontology named ROH. It captures the entities and relations in the academic and research domains. ROH reuses concepts and relations from existing vocabularies/ontologies. Competency questions and SHACL rules are used to validate ROH.

I appreciate the authors for putting in efforts to build a high quality ontology.

The following are the strengths of this submission.

1) Several terms from existing vocabularies/ontologies have been reused in ROH.
2) A good number of competency questions (CQs) have been used to validate the ontology.
3) A continuous development and integration step has been included in the ontology engineering process where the competency questions are rerun, and the ontology documentation is regenerated when there is a change to the ontology.
4) Good documentation has been provided for the ontology.
5) Permanent URLs have been used to identify the ontology resource.
6) Code is open-sourced.
7) The labels and descriptions of classes and properties are available in English and Spanish.

I have the following questions/suggestions for the authors.

1) A sample/synthetic dataset is generated and used to validate ROH. Why isn't real-world data from a university used instead? Please replace the synthetic dataset with a real-world dataset to validate the ontology.
2) The use of ontology design patterns (ODPs; http://ontologydesignpatterns.org/wiki/Main_Page) would make the ontology more modular. I would encourage the authors to explore the ODP repository and pick a few relevant ODPs to use in ROH. AgentRole ODP (http://ontologydesignpatterns.org/wiki/Submissions:AgentRole) and ActivitySpecification ODP (http://ontologydesignpatterns.org/wiki/Submissions:AgentRole) are two possibilities.

Other comments/questions.

1) As mentioned in Section 5.3, the authors consider Funding as an action. Should there be a class for an action or would a property be more suitable? To me, the latter (property) seems more appropriate, especially after looking at Fig. 7, where several classes are connected to funding through the same relation (funds). We can use roh:funds to connect these classes with the classes that roh:Funding connects to (perhaps, for example, Project?). Properties such as hasFundingID can also be added to capture the other details. Also, on page 10, line 49 (and Eq. 1), funds to some Funding seems wrong.
2) From Section 5.3, having FundingAmount as a class is a little confusing. What are the potential instances of this class? Why can't it be a property?
3) In Table 2, is "Researcher Role" a single class? Is it meant to represent a Researcher? If so, a Research Fellowship is a type of fellowship and does not seem like a researcher or a role played by a researcher?
4) In Table 2, what is the difference between the Subject and Degree entities?
5) In Table 2, the subclasses of Internship are Predoc and PostDoc. I don't think this hierarchy is appropriate because interns are temporary whereas the other classes are full-time positions.
6) On page 10, it is mentioned that entities have categories instead of a hierarchy based on some criteria. What are these categories, and where are the criteria defined/discussed?
7) How are the rules (Eq. 1, 2, 3, ...) implemented?
8) On page 13, the range of inScheme can be either KnowledgeArea or ProjectClassification or HRClassification or FundingProgramClassification. Is it justified to use this property in four different contexts?
9) In page 17, why is hasPublicationVenue connected to a Collection? What information is captured as part of the publication venue?
10) Does Eq. (9) mean that there can only be two publication metrics for a Journal? The case when z is equal to t is not handled.
11) On page 18, hasMetric connects to Journal as well as a JournalArticle. What is the relation between a Journal and a JournalArticle? Unless one is a subclass of another, this doesn't seem right?
12) On page 20, generally, properties follow the lower camel case naming convention. For some properties, such as roh:ImpactFactorName, this is not followed.
13) Include a brief explanation of Listings 6, 7 and 8.
14) Section 6.4, why should Pellet be compiled when there are changes to ROH? Why can't the executable of Pellet be used directly?
15) In Section 7, it was mentioned that "machine learning techniques will be applied to continuously enhance the existing contents of universities’ knowledge graphs". This seems very vague. Either add a few more details or drop this line.
16) Perhaps the short Section 3 can be merged with the ontology description section.
17) Figure 1, how were the requirements gathered?
18) Table 1, how were the scenarios identified?
19) Page 9, line 44, vivo:Relationship seems to be a very general relation that can be used anywhere.
20) Please comment on the OWL 2 profile/description logic to which ROH belongs.

Typos/grammar issues

1) Page 1, Introduction, line 1, add "the" between presents and Hercules Network of Ontologies.
2) Page 1, line 42, it should be funding rather than founding.
3) Page 2, line 15, please rephrase the usage of the word "describing" here.
4) Page 2, line 20, "MA" should be "The main".
5) Page 4, line 9, the word "on" can be dropped.
6) Page 4, lines 32 and 33 should be rephrased.
7) Page 5, line 43, this line should be rephrased. "At this analysis" => "After this analysis"?
8) Page 6, line 27, it should be eg., European.
9) Page 6, line 49, "which they have participated" => with whom they have collaborated.
10) Page 6, line 50, "which are" can be removed.
11) Page 7, line 33, extend => extent
12) Page 7, line 46, at => in
13) Page 9, line 32, it should be roh:PeerReviewedArticle and not PeeReviewedArticle.
14) Page 10, where and how is the Web Annotation Data Model used? Anotation is spelt wrong in Table 4.
15) Page 11, line 7, the phrase "sparsely but just in succint remarks" should be rephrased.
16) Page 13, line 21, at => in
17) Page 20, line 13, cite => citation
18) Page 21, line 7, "Last" can be removed
19) Fig. 13, competence => competency
20) Page 26, line 27, the phrase "particularities of" can be removed.

In general, please run a grammar checker on the entire document.