ROH: Towards a highly usable and flexible knowledge model for the academic and research domains

Tracking #: 3186-4400

Authors: 
Mikel Emaldi Manrique
Maite Puerta
David Buján
Diego López-de-Ipiña
Emilio Rubiera Azcona
Jose Emilio Labra-Gayo
Esteban Sota
Ricardo Alonso Maturana

Responsible editor: 
Karl Hammar

Submission type: 
Ontology Description
Abstract: 
This paper presents the work developed by the Hercules-ASIO project, putting special emphasis on the design and development of the ROH network of ontologies. ROH (Red de Ontologías Hércules, by its Spanish naming) aims to model thoroughly the main entities and relationships of the academic and research domain, e.g., projects, researchers, academic articles, universities, courses, organizations or research results. In this paper, the methodology followed for the development of ROH is detailed, paying special attention to the implementation and validation phases. Consequently, the most relevant entities are described, as well as their relationships, followed by a wide range of methods applied to continuously evaluate and enhance the ontology’s correctness and exhaustiveness.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 29/Aug/2022
Suggestion:
Reject
Review Comment:

In this second submission, the paper is improved, following some of the reviewers’ recommendations. However, in my opinion, there are still some major issues that should be tackled.

For a use-case experiment, a knowledge graph has been constructed, based on the ROH ontology but which does not use all the ontology and therefore some competency questions cannot be used. Those presented in the paper are quite simple and if I understand correctly, the RDF dataset does not enable to get answer on all the queries involving knowledge area (a precise indication of what it covers would be welcome).

An evaluation of the ontology is still missing: a discussion on the competency questions implemented, a user evaluation showing the easy adoption of the ontology and an application relying on it.

When browsing ROH I notice a pb in the dereferencing mechanism: e.g. http://w3id.org/roh/mirror/vivo#Presentation but http://w3id.org/roh/mirror/vivo#Presentation does not lead to the location of its description (https://herculescrue.github.io/ROH/roh/#/mirror/vivo#Presentation) but to the source code of the ontology as a whole.

Moreover, what is done for the reuse of existing ontologies, e.g. FOAF, is not correct : their URIs should not be changed.

On OWL and SHACL, I am even less convinced than before about the formalization.
The indication of OWL-DL as chosen profile raises questions on computation times.
P.3 It is not correct to call a unary or n-ary predicate a non logical symbol. Moreover, you must choose between FOL and OWL when presenting, you cannot write that your chosen language is OWL and then try to write FOL formulas. This also holds for all the added formulas P. 4, P.12, P.15, P.18, P.20, P.22. Moreover what are these formulas is unclear, they are neither OWL definitions nor SHACL constraints. E.g. formulas P.12 are useless if you are using a OWL engine, the general rules implementing the semantics of the language will enable to infer what is expected.
On rules P.15, what are they? how are they implemented concretely? In their meaning, I am ok for Funding Programs but it seems to me that a Project should be assocated to some classes/categories and not classifications?
P.5 What does “restrictions and validation scripts” mean? What does “in languages like SHACL” mean? What did you do precisely, concretely?
P.6 on SHACL is much too vague.

On the mapping of FECYT’s CVN to ROH: I agree that it shows that ROH answers the same use cases as FECYT CVN does (but it should be developed) but again, I do not see the mapping as a mean to really validate ROH but rather as part of the knowledge extraction process.
On the vertical module “knowledge area”: I am wondering what are the relationships between the scientific domains, the subject areas, the UNESCO codes and the FECYT referential. No alignment has been done, therefore what is the meaning of jointly using several of them? What are the use cases?

On Metrics, the modeling choice does not seem correct to me: my understanding of the added explanation is that the same info for some journal will be duplicated for each paper published in a given period.

P.29-30 In my opinion, long GitHub code is not appropriate in a scientific paper.

The mismatch between concepts in Table 2 and those in the ontology should be fixed or discussed, it is confusing to introduce required concepts and define other ones in the ontology, if I understand correctly the answers to reviewer 3.

Review #2
By Laura Pandolfo submitted on 08/Sep/2022
Suggestion:
Accept
Review Comment:

The authors have successfully addressed my suggestions and comments. I recommend this work for publication since it represents a good ontology description paper.

Review #3
By Raghava Mutharaju submitted on 16/Oct/2022
Suggestion:
Minor Revision
Review Comment:

Thank you for providing an explanation and addressing my comments and suggestions. The following are my comments.

1) Is it OWL DL or OWL 2 DL? The current version of OWL is OWL 2.
2) Add the figure on Funding from the revision letter to the main manuscript.
3) In Fig. 7, it is given that vivo:FundingOrganization is an rdf:type of foaf:Organization. Does this mean that vivo:FundingOrganization is an instance? In the text description above this figure, it is given that vivo:FundingOrganization is a class. Which one is it?
4) In Table 2, what does a Subject mean? Please mention this in the manuscript.
5) In Table 2, using the term "Placement" is not appropriate. Here is the meaning of placement from a dictionary - "the act of placing. the state of being placed. the act of an employment office or employer in filling a position. location; arrangement: the placement of furniture". Make that as the parent class of Predoc, Postdoc, Research and Education seem completely misplaced (Placement is an action whereas Predoc and Postdoc are positions). I am not sure of the relationship between Research, Education and Placement (it is certainly not subclass-superclass hierarchy).
6) >> the range of inScheme can be either ...
Is it possible to say that the range of inScheme is a skos:ConceptScheme? In that case, any of its subclasses automatically fall under the range of inScheme.
7) The issue of hasPublicationVenue and Collection ... A bibo:Journal can have a collection of articles. But I don't see why it should be a subclass of a Collection. An article published in the bibo:Journal can be a publication date. But I don't think all the properties of bibo:Journal can be put into a Collection.

Review #4
By Andrea Giovanni Nuzzolese submitted on 10/Dec/2022
Suggestion:
Major Revision
Review Comment:

The paper presents ROH, which is an ontology network designed and developed in the context of the Hércules project. The ROH ontology network models the domain of research from multiple perspectives, i.e. administrative, financial, scientific, etc. ROH is detailed along with the methodology adopted by the authors for modelling the ontology. Such a methodology spans from the definition of the initial requirements to the final evaluation. The methodology the authors opt for is based on four main activities, namely: Requirement analysis, Selection of Ontologies (aimed at reusing concepts and properties from existing ontologies in the Semantic Web), Implementation, and Evaluation.

The paper is well written and structured in all its parts.

=== Strengths ===
The ROH ontology network models a relevant domain by providing a modular solution for integrating different sub-domains that compose the academic ecosystem.
The URIs are defined as persistent identifiers by relying on w3id for redirecting purposes.
The description of the methodology is fair and its definition and adoption in the context of Hérculues is motivated by the authors by also including state of the art methodologies (e.g. XD) in the discussion.
The description of the ROH ontology network is good and clarifies the design choices the authors opt for.
The evaluation tackles the relevant aspects for ontology validation and, generally speaking, can be considered fair.

=== Weaknesses ===
Nevertheless, the paper shows some weaknesses that, in my opinion, need to be addressed in order to improve the paper for publishing.
Those weaknesses are:

+++ Limited contextualisation into the state of the art +++
The authors list a number of state of the art ontologies and models into the related work section (cf. Section 2).
The overview on existing ontology is close to be comprehensive, although the authors do not cite ScholarlyData [1], which is in my opinion worth to be mentioned. However, the related work section does not clarify how ROH is positioned with respect to each of the ontologies identified as state of the art. Hence, it results difficult for a reader to get the added value or the novelty introduced by ROH.

+++ Lack of working examples +++
Each module is described by also providing its representation with description logics. This is much appreciated and valuable. Nevertheless, the descriptions of those modules miss usage examples with data. The latter point is utmost important for illustrating how the modules can be used and grounded to (real) RDF data.

+++ Ontology design patterns and anti-patterns +++
The authors adopt ontology design patterns for modelling the modules composing the ROH ontology network.
This improves the soundness of the resulting ontological artefacts according to some results presented in literature.
However, some of the modules show some well-known anti-patterns.
For example, the relation between Person and his/her Role has to be handled carefully if not time-indexed. The latter point seems to be the case of ROH.
Namely, how is it represented a person having his role changed during time?
A similar issue can be found when associating a foaf:Organisation with a gn:Feature by means of the object property gn:locatedIn. What does it happen if an organisation moves its location from one site to another at a certain instant in time?
The use rdf:Seq for representing lists does not comply with the claim of adopting best ontology design practices and patterns. There is extensive literature about how to model lists in OWL. An example from the ontology design pattern community is the Sequence pattern [2].

+++ Ontology Reuse +++
There are no mandatory guidelines for ontology reuse. Nevertheless, some authors [3, 4] distinguish among direct, indirect and hybrid re-use strategies by identifying clear benefits and drawbacks for each of them.
The authors opt for a direct ontology re-use strategy, i.e. classes and properties such as foaf:Person, vivo:FundingOrganization are directly re-used in the ontology.
Those ontology entities are defined by external ontologies and it is not clear what is the impact of their direct re-use in ROH.
First, external classes and properties are defined in their corresponding ontologies with their own semantics. Is the original semantics preserved in ROH?
Then, what does it happen if a class or property defined by an external ontology is modified, deprecated, or cancelled?
Those aspects need to be further discussed in the paper.

+++ Evaluation +++
No structural information about the ROH ontology is provided.

1. Nuzzolese, Andrea Giovanni, Anna Lisa Gentile, Valentina Presutti, and Aldo Gangemi. "Conference linked data: the scholarlydata project." In International Semantic Web Conference, pp. 150-158. Springer, Cham, 2016.
2. http://ontologydesignpatterns.org/wiki/Submissions:Sequence
3. Presutti, Valentina, Giorgia Lodi, Andrea Nuzzolese, Aldo Gangemi, Silvio Peroni, and Luigi Asprino. "The role of ontology design patterns in linked data projects." In International Conference on Conceptual Modeling, pp. 113-121. Springer, Cham, 2016.
4. https://dx.doi.org/10.3233/SSW200033