A Holistic View over Ontologies for Streaming Linked Data

Tracking #: 3227-4441

Authors: 
Pieter Bonte
Femke Ongenae
Riccardo Tommasini

Responsible editor: 
Frank van Harmelen

Submission type: 
Survey Article
Abstract: 
Applied research and prototypes constitute an important part of the initiative around Stream Reasoning (SR) research. From Social Media analytics to the monitoring of IoT streams, the SR community worked hard on designing working prototypes, query languages, and benchmarks. Applied work that uses stream reasoners in practice often requires a data modeling effort. For this purpose, RDF Stream Processing (RSP) engines often rely on OWL 2 ontologies. Although the literature on Knowledge Representation (KR) of time-varying data is extensive, a survey investigating KR for Streaming Linked Data is still missing. In this paper, we describe an overview of the most prominent ontologies used within SR applications and compare their data modeling and KR capabilities for Streaming Linked Data. We discuss these ontologies using three complementary KR views, i.e. viewing the streams as Web resources, a view on the structure of the stream, and a view on the modeling of the events in the streams themselves. For each view, we propose an analysis framework to facilitate fair comparison and in-depth analysis of the survey ontologies.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/Oct/2022
Suggestion:
Major Revision
Review Comment:

This survey paper focuses on reviewing different ontologies as KR formalism for streaming linked data (SLD). Three perspectives are considered: the suitability of a given ontology for SLD as web resources, assessed based on FAIR principles; ontologies used for describing stream structure, assessed based on metadata and reasoning capability of such ontologies with respect to Stream Reasoning tasks; and ontologies for the stream content, assessed based on the suitability of the representation to characterise event patterns.

The topic as well as the need for the proposed survey is suitably presented, minor some imprecisions, and there is a clear indication of how the selected approach to compare have been identified.

The focus is on the ontological side of things and RDF Stream Processing, and does not cover other expressively levels of stream reasoning such as CEP or non-monotonic stream reasoning. This has been somehow mentioned at the beginning, but it should be clear that this survey specifically targets approaches to RDF Stream Processing, a subset of Stream Reasoning research.

The more general definition of SR as an area of research, however, is provided in the Encyclopaedia of DB Systems [1] and should be referred to when talking about SR research.

[1] Mileo, A., Dao-Tran, M., Eiter, T., Fink, M. (2018). Stream Reasoning. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_80715

BALANCE ACROSS THE THREE PERSPECTIVE (streams as web resources, stream structure and stream content):

There are a few gaps in the analysis and discussion:
- R3 is not covered in table 3 and discussion.
- At the end of Section 4 here is an intuition on the combination of some ontologies to maximise coverage of FAIR based on Table 3. However it is not clear what are the point of contact/integration or via which specific concept can the combined ontologies overlap and whether these can be used to guide the selection of which ontologies to integrate.
- SSN/SOSA L4 missing from Table 4 but present in text
- comments about the Composition of events says that this is not discussed at the level of Section 6 as can be defined at higher levels of abstractions. However, there is no discussion of the composition/reasoning capability of the ontologies used at the structure level (Which is where I suspect the composition might occur?)

I think the ability to reason over streams structure (Section 5) is lacking some more formal specification of the properties of each meta-structure Level, as well as a discussion of the expressivity of the related ontologies. Shouldn’t the ability to do (stream) reasoning at this level be considered and compared as well? Independently of the type of concepts used for representing the meta-structure of streams, the semantic of those is greatly embedded in the ontology’s expressivity and reasoning capability.

The survey paper is presenting an interesting analysis of different ways for representing and manipulating Linked Stream Data in RDF, and can be valuable as a guide for the research community working in this area specifically. What is missing, however, is a set of best practices or use-case scenarios that would indicate the suitability (or not) of some specific ontologies across all three levels discussed in the paper.

I would suspect that in some cases ticking all the boxes at one level might be more relevant even though other levels are not entirely covered.
This would be a very valuable addition to the Survey and could, in this instance, be achieved by characterising research paper that have used a specific type of ontology/ontologies to represent and process LSD.

Sharing code not applicable to this survey type article

Comments on Clarity:

It is not clear whether the identification of the three perspectives happened before the paper selection process, or was determined by the result of the paper selection process. Please clarify.

Improvements could be made re. Readability and clarity, specifically with respect to some common imprecisions in the text overall. Details as follows:

You need to check for typos and grammatical errors throughout

Some words are strike-through to indicate those aspects have not been considered in the paper. This is not the way to indicate that.

Page 2: “A number of worked emerged…” (missing references)

Page 2: Incomplete sentence: “Event though a number of … applications. ”

Ref [15] is incomplete

Review #2
Anonymous submitted on 07/Oct/2022
Suggestion:
Major Revision
Review Comment:

In this survey article the authors investigate ten ontologies for modeling Streaming Linked Data. They use three different analytical frameworks to compare the ontologies (resp. their applicable subset) with each other. The first framework is based on the FAIR principles, the second is based on a temporal meta-conceptualization developed by the authors, and the third one is based on the Common Event Model.

The authors give a proper introduction and motivation in Section 1. The survey methodology they introduce in Section 2 seems sound, however, it is not exactly clear how the ten ontologies have been extracted from 32 papers. Are all ontolgies used in multiple papers? Have some ontologies been omitted? A depiction which ontolgy is used in which (or at least how many) paper(s) would have been helpful.

The first analysis (FAIR principles) is a high-level analysis in the sense that it is investigated how ontologies can be used to describe streams as Web resources. The analytical framework used here (and also the ones used for the following two analyses) is in my view quite problematic. It is not motivated why the FAIR principles should be suited in any way to analyze ontologies for stream reasoning. Although the authors claim that they previously adapted the FAIR principles to streaming data in their cited work, these principles still apply to instance data and not to ontologies/conceptualizations. On top of this, after introducing the FAIR principles the authors use different categories (Identity, Metadata, etc.) to characterize the ontologies. These categories seem to be somehow derived from the FAIR principles but it is unclear how exactly. Not all FAIR principles are covered by the categories (F4, I2, I3, R3 are not covered; an explanation for this is only provided for I2 and I3) and when there is a mapping between categories and principles the categories investigate different criteria than the FAIR principles. E.g. the Identity category claims to map to F1 which demands that “Data should be assigned unique and persistent identifiers, e.g., DOI or URIs”. For Identity, however, it is only investigated whether there exists a class for Web streams in the ontology - which is a totally different thing (the class could be assigned to a blank node without a URI).

The second framework (temporal meta-conceptualization) is used to look deeper into the ontologies and is created directly from the investigated ontologies instead of building on top of an existing framework. Five different concepts are proposed in order to characterize the classes in the ontologies. Although there are some related definitions in the paper, the definitions are not very helpful for assigning classes to the concepts; it is not clear how the concepts can be differentiated sharply (especially time-varying and continuous). Also Figure 5 is a bit confusing as it implies a structure among the concepts (continuous at the bottom, time agnostic at the top, and the rest in in between) that is not further explained.

The third framework (Common Event Model) claims to be even more low-level than the second one which does not really make sense to me – it just takes a different perspective. The dimensions of the Common Event Model, however, seem very well-suited to describe stream reasoning ontologies. Only for the structural dimension and the related notions of ontology kernels and level it is not really clear to me how this could be helpful in evaluating ontologies. Another problem in this section but also in previous ones is that many tables and figures are quite isolated and not referenced from the text or further explained.

All in all, the authors give a well-written and comprehensive overview over common ontologies for stream reasoning. The foundation of the analysis however could be improved and justified better as the selection of the analytical frameworks seems arbitrary.

Review #3
Anonymous submitted on 03/Nov/2022
Suggestion:
Minor Revision
Review Comment:

An ability to continuously process dynamically incoming data is an important prerequisite for various monitoring, analytics, or control applications. Such applications can be found in various environments including the Web, where different actors, like social networks, governmental institutions, or transportation companies continuously stream new data keeping their users up-to-date. Applications delivering these updates to the end-users must be able to efficiently find, access, and process this data. However, finding these data streams on the Web and understanding their structure is problematic. Various solutions were suggested over the last decade to provide specifications of data streams, but these initiatives are application-driven, thus, focusing on some particular domain. This paper is the first work aiming to unify these efforts in one framework by introducing different levels of detail for data stream ontologies, surveying existing work, and providing their classification according to the suggested schema.

I find this work is important for many research initiatives dealing with streaming data, like stream processing, complex event processing, or stream reasoning, since it facilitates the unification of stream modeling frameworks. The classification suggested by the authors is novel and reasonable providing three levels of abstraction:
- description of streams as resources,
- modeling of internal data structures, and
- description of individual instances
The methodology used by the authors is sound and, to the best of my knowledge, provides an excellent overview of existing work. Although the work is mostly interesting to the researchers working with streaming data, I think that it represents an important part of the Semantic Web efforts allowing for tighter integration of streaming data in general initiatives like Linked Data.

My main concerns about the paper are its readability and suitability as an introductory paper to the topic.

Readability: The presentation can be significantly improved by removing duplicates, e.g., in the introduction, the paragraphs in lines 14 and 39 are vastly overlapping, or the first sentences of sections, which simply repeat the section name. The notions are used non-systematically, like "10k ft view" vs. "ten-thousand foot view", or "TimeVarying" vs. "time-varying". Given a more systematic presentation of the material, the authors might free some space for a detailed presentation of skipped but relevant material, like the improving clarity of the paragraph describing Figure 5 and adding material about continuous level (L5).
The purpose of some definitions, like Definition 3 and 4, is not clear. They are not related to previous definitions and are not really defining anything, which cannot be simply described in the text. Definition 1 defines a "Web data stream", but Definition 2 refers to a "Web stream".
Figures and tables are placed at random positions, which makes the paper very uncomfortable to read. For instance, Figure 4 is placed on page 7, but is discussed only on page 8.
References are mixed up, e.g., on page 2, the footnote referring to CES is placed near OWL-S.
Typos
- defines a collection of immutable objects that evolve -> evolves (?) since an immutable object cannot evolve
and arbitrary use of capitalization
- Semantic Web vs. semantic web.

Suitability: The authors could better represent the selected ontologies. Many of them use shared vocabularies or are extending other considered ontologies. Maybe a figure showing dependencies between the ontologies would help to simplify the presentation.
Conclusions repeat the results reported in the paper. It would be more interesting to get the opinion of the authors about the ontologies, like what is good, which things can be improved, etc. The survey, in my opinion, should not only review existing things but do it critically, thus guiding future research.