Applying the LOT methodology to a Public Bus Transport Ontology aligned with Transmodel: Challenges and Results

Tracking #: 2612-3826

Authors: 
Edna Ruckhaus
Adolfo Antón Braco
Mario Scrocca
Oscar Corcho

Responsible editor: 
Guest Editors Transportation Data 2020

Submission type: 
Ontology Description
Abstract: 
We present an ontology that describes the domain of Public Transport by bus, which is common in cities around the world. This ontology is aligned to Transmodel, a reference model which is available as a UML specification and which was developed to foster interoperability of data about transport systems across Europe. The alignment with such a complex non-ontological resource required the adaptation of the Linked Open Terms (LOT) methodology, which has been used by our team as the methodological framework for the development of many ontologies used for the publication of open city data. The ontology is structured into three main modules: (1) agencies, operators and the lines that they manage, (2) lines, routes, stops and journey patterns, and (3) planned vehicle journeys with their timetables and service calendars. Besides reusing Transmodel concepts, the ontology also reuses common ontology design patterns from GeoSPARQL and the SSN ontology. As part of the LOT data-driven validation stage, RDF data has been generated taking as input the GTFS feed provided by the Madrid public bus transport provider (EMT). Data transformation rules were expressed using RML mappings, and materialised, and queries corresponding to competency questions were developed and tested. Currently, a generic and reusable REST API is being developed and it can be adopted by other organizations to standardize the publication of open data in this domain.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Umutcan Simsek submitted on 05/Dec/2020
Suggestion:
Major Revision
Review Comment:

Summary:
The paper describes the development of an ontology to represent the data in the public transportation domain, particularly passenger transportation by buses. It covers three major aspects, namely (1) agencies, and operators, (2) lines, routes, and journeys, and (3) journey scheduling. The ontology is aligned with the Transmodel standard, an international standard for public transportation data exchange. The paper is highly relevant to the special issue, and the approach overall appears to be technically solid in most places. However, some issues (e.g., lack of a proper literature review, concrete use cases, and stakeholders) make it hard to judge in specific review dimensions. Therefore I suggest that the paper goes through a major revision.

Design principles, methodologies applied at creation

The LOT methodology is used. The methodology and its adaptation have been explained overall adequately. However, I still see a few relatively significant issues:

* Where are the restrictions? The authors claim that a previous attempt to engineer an ontology based on the Transmodel standard had its drawbacks. One of these drawbacks is the implementation of the restrictions. Does the proposed ontology implement these restrictions? How are they evaluated? Is any language other than OWL (e.g., SHACL, assuming the restrictions are seen more as constraints in Transmodel) used?

* The evaluation and requirement specification is only briefly covered (and no supplemental material submitted with the paper). Who are the domain experts? Who are the stakeholders? There are external links to the competency questions and use cases; however, there are only four questions and three use cases, which are not linked together explicitly. Are these questions comprehensive enough? Where are the generated Knowledge Graphs that are used in the evaluation mentioned in the paper? Did the queries formalizing the competency questions run on a triple store with OWL DL reasoning activated? It is claimed that OOPS! Report showed only minor issues. I do not see any link to this report, so what "minor" means is not very clear. For example, when I run the Hermit reasoner with Protege on the ontology, there are some impossible (equivalent to owl:Nothing) class definitions, e.g., Authority is a subclass of Organization, and these two classes are disjoint). Some properties are a sub-property of every other property in the ontology (e.g.viewedAs. not sure if this is the intended meaning). (Disclaimer: I downloaded the ontology from the linked website on the 4th of December 2020). In general, it would be beneficial to expand the requirement specification and evaluation sections slightly and provide supplemental material (e.g., OOPS! Report) that clarify the evaluation and requirement specification.

* The NOR transformation step is depicted as an extension to the LOT methodology, which is based on the NeOn methodology. According to the paper, NOR re-engineering pattern is already a part of the NeON methodology, then how is it an extension to LOT?

* This is more of a design question for my understanding: Why is the Incidence type a part of the ontology's Organization module? It appears to be more suitable for Part III as it affects the journey of a bus.

Comparison with other ontologies on the same topic

There is no proper related work section. Only the ontology produced by the SNAP project has been analyzed. Given the long line of smart city projects at both the EU and national level, there must be many other ontologies (e.g., [1]) that can be discussed with this work.

Pointers to existing applications or use-case experiments

The use cases and applications are provided minimally. Only very generic use cases are mentioned in the introduction and a bit more in Section 5. There is an external link to the use cases in which the descriptions are not much more detailed than in the paper (Also in Spanish, so I can tell as much as Google Translate allows me). Given that the work is done in the scope of a national "Open data for Smart Cities" project, there should be some concrete use cases, foreseen applications, prototypes, etc.

Convincing evidence for the quality and the relevance of the described ontology

The ontology is aligned with an international standard and reuses several existing ontologies. This fact already indicates a potential wide-adoption. However, judging the paper in this dimension is particularly challenging because there is no proper related work section (only some reused resources are reviewed in detail). The issues mentioned above in "Explanation of Design principles, methodologies applied at creation" makes it hard to judge the quality. The limited concrete use case descriptions and the missing alignment with other international efforts in the field prevent the full evaluation of the ontology's relevance.

Illustration, clarity, and readability of the paper

The paper is overall well-written and understandable. However, some sentences are too long. The first sentence in the introduction already takes ~10 lines!
There are some small typos. The paper can benefit from proof-reading. Examples:
* page 7 line 22 500 mt ratio -> 500 mt radius
* page 4 line 44 idea if -> idea of
* page 5 line 26 based of -> based on

The structure of Section 5 does not reflect the steps of the methodology. It would have been easier to follow if each step was a subsection.

The acronym GTFS is used first in the abstract; however, explained later in the paper.

The introduction should contain the answers to the What, How, and Why questions. However, How is too detailed and can be covered in the sections where the ontology development is explained in detail.

There are also some issues with the references and namespace prefixes:

* A citation to the RDF Mapping Language (RML) is missing.
* Some namespace prefixes are not explicitly connected with an * ontology (e.g., estraf, sf). Overall the representation of reused ontologies and namespaces could be improved.
* "GeoSPARQL location design pattern" is mentioned several times, but I did not see any footnote referring to this specific design pattern.

The concepts of "incidence" and "incident" are mixed up. One means the frequency of occurrence; the other refers to something that occurred. Nevertheless, I assume this confusion comes from one of the reused ontologies. Maybe an explanation with a footnote is in order.

In Table 1, the last row has never been mentioned in the paper.

[1] Katsumi, M., & Fox, M. (2018). Ontologies for transportation research: A survey. Transportation Research Part C-emerging Technologies, 89, 53-82.

Review #2
By Anastasia Dimou submitted on 31/Dec/2020
Suggestion:
Major Revision
Review Comment:

This paper describes an ontology to model data regarding public buses beyond GTFS. 

However, is this the only contribution of the paper? An extension to the LOT methodology is claimed as well and in the introduction, it is mentioned that an ontology was developed for structuring how to publish open data about buses (page 2, line 20). But ontologies are meant for modeling, how to publish goes beyond developing an ontology and applying it to certain data and the paper seems to cover more than the modeling of the ontology. I think the paper would benefit from a clear list of contributions.

In the introduction, it is mentioned that the LOT methodology is extended (later on, it is mentioned that the NeOn methodology is extended. Even though  the LOT methodology builds on the NeOn methodology, it should become clear to the readers which aspects of the LOT methodology belong to the NeOn methodology and which aspects of the LOT methodology go beyond the NeOn methodology and that is why the LOT methodology was preferred over the NeOn methodology or other methodologies. In the current version it feels like the methodology was chosen because it is the in-house technology.

The paper seems to be part of a broader effort related to semantically annotating transport data and this is very positive. I am not an expert on the public transportation domain, so I cannot judge the contribution considering the application domain. The paper does not clearly demonstrates the importance of the solution for whoever is not a domain expert and that still needs to be improved in a follow up revision. What is already addressed? What still needs improvement? How this scientific contribution takes the domain to another level?

The aforementioned occurs to a certain extent because on the one hand, the problem to be solved is not clear nor why it is important to be solved, and, on the other hand, because the innovative aspects are not highlighted nor positioned with respect to the state of the art. In fact, there is no related work mentioned at all, besides preliminaries. Related work does not only miss regarding the ontology development methodologies but also related to other  relevant vocabularies and standards/models beyond the ones that were actually used.

The paper lacks technical depth, namely the ontology is presented as an end product and the decision making explanation is completely missing. Which were the challenges? which were the innovative parts of the solution? For instance, it is mentioned that the ontological conceptualization was split in three parts because of the complexity and variety of concepts, but it is not mentioned which these complexities were nor how it was decided on which concept goes were. E.g., why are the journey patterns and planned journeys separated?

From the description and the figures, it is not clear which parts of the ontology are result of reusing existing ontology terms nor which these terms are. I also have the feeling that in principle existing ontology terms are reused which is great considering ontology reuse is the vision for the semantic web but does this constitute a scientific contribution by itself? what was innovative in this reuse scenario?

In the paper, it is mentioned that two types of information about public buses are considered: public and private. But how does this influence the modeling of the ontology?

In the implementation section, the challenges that were encountered are outlined but they do not seem to be very challenging. In the end most of them are addressed with a graphical ontological model.

Considering this paper falls in the ontology description category, the following should hold:

"The descriptions should be brief and pointed, indicating the design principles, methodologies applied at creation"
While the methodologies part is well-covered, It seems that the design principles in the expected description go beyond methodologies, but in the paper no clear design principles are mentioned.

"comparison with other ontologies on the same topic"
There is no comparison to other ontologies at all. Not even the ontologies which are reused are described in detail, but only other models that describe the same domain.

"pointers to existing applications or use-case experiments"
Applications are mentioned but not detailed. Use-case experiments are relatively weak. The paper is positioned with respect to the domsin but no concrete use-case experiments are analysed.

"It is strongly encouraged, that the described ontologies are free, open, and accessible on the Web.  If this is not possible, then the ontologies have to be made available to the reviewers."
this is well addressed.

"Submissions will be reviewed along the following dimensions: (1) Quality and relevance of the described ontology (convincing evidence must be provided)."
I think that the current version of the paper does not provide clear evidence of the ontologies quality, but the relevance of the ontology to the special issue is clear.

"(2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology."
These aspects may still be improved. As mentioned earlier, the description of the ontology lacks clarity and the key aspects are not well highlighted. Which terms were reused, which were new? Which existing ontologies were considered and why? How was it decided which terms will be reused and from which ontology?

Review #3
By Christoph Lange submitted on 24/May/2021
Suggestion:
Minor Revision
Review Comment:

This manuscript presents the Public Bus Transport Ontology, an ontology based on the NeTEx and GTFS standard data formats and aligned with the Transmodel standard data model. The manuscript focuses on how the ontology was created, applying the LOT (Linked Open Terms) methodology to NeTEx and GTFS, being "non-ontological resources" (NORs).

Section 1 gives a comprehensive introduction to the application domain, taking a broad "Smart City" perspective, and existing data standards. Section 2 introduces the reused NORs GTFS and Transmodel in more detail, including Transmodel's UML specification. Section 3 introduces the relevant SNAP project, in which the authors have already used Transmodel, and the LOT methodology, in particular how it works with NORs.
Section 4 explains how the LOT methodology was applied in the given case. Section 5 presents the requirements and implementation of the Public Bus Transport Ontology in detail. While, during that implementation, terms from the Transmodel NOR were already reused, Section 6 focuses on the alignment of the Public Bus Transport Ontology with Transmodel, pointing out specific solutions for specific problems. Finally, Section 7 concludes.

The manuscript meets the specific review criteria as follows:
* The relevance of the ontology is demonstrated by explaining the national and European application context. The ontology has a high quality thanks to its systematic engineering process.
* The manuscript clearly covers the key aspects of the described ontology. A “Long-term stable URL for resources” is not explicitly provided, but a well-maintained GitHub repository is available. The main README, however, is Spanish only. The provided resources appear to be complete for replication of experiments in that they comprise tests, documentation and examples.

These are the main issues with the manuscript by decreasing priority. For full detail please see the annotated PDF at https://www.dropbox.com/s/uceixhv6370f1o0/swj2612.pdf?dl=0.
* The section structure is not consistent. Section 2 promises to introduce SNAP, which is actually introduced in Section 3, and the LOT methodology is introduced as a subsection of that. SNAP was probably marked up as a section rather than a subsection by accident. Similarly, it does not make sense that Section 5 "Public Bus Transport Ontology" has an immediate subsection with the same title as its only subsection. As the title of Section 5.1.2 already suggests a focus on the implementation of the Public Bus Transport Ontology, it does not make sense to include "Implementation" once more in the title of Section 6.
* Figure 11 lists some Spanish term names in the esautob namespace, even though in the text they are English.
* It is not exactly clear what system for geo coordinates you are using. Section 3 mentions WGS84, Figure 8 mentions ETRS89. How do you relate the different coordinate systems to each other?
* For the EU Public Sector Information Directive, a link to the European version in addition to just the Spanish one should be provided.
* Various minor linguistic issues (see PDF for details)