Semantically-enriched Pervasive Sensor-driven Systems

Paper Title: 
Semantically-enriched Pervasive Sensor-driven Systems
Juan Ye, Stamatia Dasiopoulou, Graeme Stevenson, Georgios Meditskos, Vasiliki Efstathiou, Simon Dobson, Ioannis Kompatsiaris
Pervasive and sensor-driven systems are by their nature open and extensible, both in terms of their inputs and the tasks they are required to perform. The data streams coming from sensors are inherently noisy, imprecise, and inaccurate, with differing sampling rates and complex correlations with each other: characteristics that challenge traditional approaches to storing, representing, exchanging, manipulating and programming with rich sensor data. Semantic Web technologies allow designers to capture these properties within a uniform framework. The powerful reasoning techniques with such a representation facility have proven to be attractive in addressing issues such as context modelling and reasoning, service discovery, privacy and trust. In this paper we review the application of the Semantic Web to pervasive and sensor-driven systems. We analyse the strengths and weaknesses of current and projected approaches, and derive a roadmap for using the SemanticWeb as a platform on which open, standards-based pervasive, adaptive, and sensor-driven systems can be constructed.
Full PDF Version: 
Submission type: 
Survey Article
Responsible editor: 

Submission in response to

Solicited review by Jean Paul Calbimonte:

Review--Semantically-enriched Pervasive Sensor-driven Systems

The paper presents a survey of applications and approaches that use Semantic Web technologies and tools for sensor-driven systems.

This work identifies 5 key requirements for this type of systems: modeling, reasoning, uncertainty, discovery, privacy/trust, and evaluates how Semantic Technologies applied to sensor-driven applications meet them or fulfill them.
The paper also identifies some challenging issues in the area, namely temporal features, dynamism, provenance and programming.
This work identifies some research areas that are exploitable, wrt the requirements and also the challenging issues, from which the reader may derive a sort of road-map for the future.

The subject chosen by the authors is highly relevant for the community, and there is a vast number of works in the literature covering many of the aspects addressed. The article is well written in general terms, and the writing style is appropriate.

As a survey, this paper references a large number of previous works, but in the text it focuses and describes with too much detail the oldest approaches (in several cases), while it would be expected that recent ones would be covered in more detail. This is seen particularly in section 4 and parts of section 5.

Also, there is an important number of very relevant works that have not been considered in this survey, most notably those in the Semantic Sensor Web/Semantic Sensor Networks community. These specifically touch many of the issues and requirements of this paper, but are mostly ignored, even though they are quite recent, and this community has been very active in the last 4 years. The authors are advised to look deeper at these, for the sake of completeness.
For this reason, some of the requirements and challenges identified, are not well covered, especially reasoning, dynamicity and temporal features. More on this in the detailed comments.

The comparisons provided do not seem to be comprehensive enough. Only a handful of systems/ontologies are compared, leaving the impression that these are the only ones available, which may not be the case. Or if the authors do not intend to be comprehensive, this should be stated explicitly. Also, in the provided comparisons, in some cases systems and ontologies (which are different things)are compared between them, which is very confusing.

Another issue is the structure of the paper. Some sections seem to be a bit out of place, or not entirely coherent. For example, some requirements are listed in one section, but not considered in evaluations/comparisons later on. More details of concrete examples of this are given in the detailed comments.

Detailed Comments

1. Introduction: Ok

2. Background: The motivational examples are representative. Section 2.1.4 should be renamed as it is not a summary but a list of examples in other domains. The requirements in 2.2.1 are valid, but the requirement of Querying seems to be missing, and is one of the key issues in sensor/pervasive applications (think of continuous queries, event querying, etc, using ontologies/semantic models). This subject is touched throughout the paper (e.g. 5.2) but should be a requirement on its own.
The requirements are valid, but not easily derived form sections 2.1 and 2.2. The authors could follow a clearer methodology for eliciting these requirements, so that the reader easily sees where they come from.
Some of the 'research themes' are not considered in the paper, namely Interaction design and essential infrastructure, this should be stated to be out of the scope of this work.

Figure 1 is very confusing. What is the meaning of the arrows? information derivation? what is the box circling Low level events, knowledge and context? Are these layers of information/knowledge?
Some concepts in 2.3 are re-defined in 2.3.2 (e.g. situation, event). This is also very confusing for the reader, the author could agree on some definition or explain how these definitions relate to each other.

2.4 In the end of this section, it is said that context models need to express: uncertainty, temporal features, generalisation, composition and dependence. However uncertainty was not listed and described before (while the other 5 were). Furthermore, later in section 5, it is not clear if these requirements are evaluated (maybe with other names? is dependence related to 'causality'?, composition to 'mereology'?. The authors are advised to use a coherent terminology so that there is a natural flow in the paper.

3. SW technologies: Table 1 makes no sense unless it is stated somewhere chat are C,D,R,S.
In general, details about DL basics could be omitted, they are common knowledge for the readers of SWJ. Most of 3.1 and 3.2 could be shortened as they do not add much new value, and be replaced with appropriate pointers to Baader et al or other Handbook. Similar Comments for 3.3 and 3.4. Furthermore the authors could develop a bit better why rules are important for sensor systems, in particular.

4. Ontology-Mod pervasive systems.
The title of this section is odd, it could be reformulated. There are two main issues in this section.
One concern in this section is that most referenced systems are 7-8 years old (cobra,gaia,asc, socam). While there are other 2 more recent works, it is surprising that these are the only representative systems found in the literature. Or if the authors do not intend to be comprehensive they should state it and explain why these are the most relevant or representative.

Other related works may include (these are only possible examples), including proposed architectures and use of semantic web technologies for sensor data, observations, context, etc:
Patkos et al. A semantics-based framework for context-aware services: Lessons learned and challenges
Patni et al. Linked Sensor Data
Broring et al Semantically enabled sensor plug and play for the sensor web
Janowicz et al A restful proxy and data model for linked sensor data
Barnaghi et al Publishing Linked sensor data
Gray et al A semantically enabled service architecture for mashups over streaming and stored data
Sheth et al Semantic sensor web
Henson et al SemSOS: Semantic sensor observation service
Calbimonte et al: Semantic Sensor Data search in a large scale federated sensor network
Systems referenced in [17]

The other issue in this section is that The comparison in Table 3, lists systems ('Ontology-based systems') such as Cobra, along with ontologies, such as Ontonym. This is confusing, how is a system comparable with a model (ontology)? The criteria cannot be applied to both of them. For instance how can we evaluate service discovery for 'top level ontologies'?. moreover, there is no column for 'Uncertainty' in the table, why is this?. Or if the table was intended to compare ontologies, obviously they all allow some degree of reasoning, but then the comparison should indicate what type of reasoning is possible (The authors mentioned OWL profiles before)

5. Integrating SW with Pervasive systems
It is not clear why this is separated form the previous section. The previous section mentions some systems and already evaluates them wrt the Requirements. These two sections need to be better and organized.

It would be expected that each subsection here corresponds to the 5 requirements identified in Section 2: Modeling, REasoning, Discovery, Privacy and Uncertainty. This is only partially achieved, for instance 5.1 seems to come from Modeling, but 5.2 too.
Section 5.1 focuses too much on syntactic-centric models (SensorMl,SDDL), while it could focus on Semantic models instead.

5.2 is quite interesting and 5.3 covers very common and up-to date event models used in linked data and applications.
Subsection 5.3.7 does not fit in 5.3 (this section talks about ontologies) because it describes event processing. This should be a separate section on its own. It is extremely relevant but the works cited are perhaps not enough, authors may consider including other approaches that aim at querying using SPARQL extensions, somehow similarly to EP SPARQL (thus enriching raw sensor data and exposing them using complex ontologies), such as (these are examples only):
C-SPARQL (Barbieri et al An Execution Environment for C-SPARQL Queries, Querying RDF Streams with C-SPARQL ߈— )
CEQLS (Le Phouc et al A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data )
SPARQL stream (Calbimonte et al. Enabling ontology-based access to streaming data sources)

Table 4 Compares some of the event ontologies according to some criteria. However it is not known where these criteria come from (eg. axiomatization, what is interpretation referring to?). Authors should clarify the methodology used for evaluating the ontologies.

Section 5.4 completely misses dynamicity in reasoning, which is one key feature of a sensor-based system. The field of stream reasoning already has gained notoriety and there is a number of works available. Authors should definitely expand this section to cover these advances in the state of the art.

6 Challenging Issues.
Authors may review other works about challenges in Semantic-based sensor systems (e.g Corcho et al: Five challenges for the semantic sensor web).

6.1 notable works in this section are missing, including those already mentioned that extend sparql with time/streaming features: StreamingSPARQL, C-SPARQL, CEQLS, SPARQLStream, etc. Some of these works also include RDF models with time windows, timestamps, RDF Streams.

6.4 can Also mention recent works including Linked Data, and combining with REST APis, for easy programmability, mashups with Sensor data, etc. There is an important number of recent examples in ISWC/ESWC conferences, Challenges, and workshops such as SSN. The authors are suggested to look into these for providing a more complete picture.

Solicited review by Vanessa Lopez:

This is a survey paper on the use of Semantic Web technologies in pervasive and sensor driven systems.

In particular it looks at how issues such as context and event modelling, reasoning, uncertainty, service discovery, privacy and trust are addressed in state of the art systems and models. The survey finishes with a set of open challenges.

The paper is generally well-written and well-presented. The topic covered is of importance to the broader Semantic Web community. The open challenges are worth mentioning and are analysed in detail. It gives an overview of many technologies and it is well-documented with references. As the authors said pervasive systems have mature from being a research topic to a commercial reality. Making this point stronger, I do miss however the mention of commercial systems (based on InfoSphere) for processing very large volumes of stream data in real time such as [1], currently used in the transport domain (Section 2.1.2) , and for which also semantic (ontology-based) approaches have been proposed to tackle heterogeneity in sensor networks [2].

In Section 2.2.1, as well as in the intro, the author mention the challenges (open issues) of capturing the temporal semantics of data, reasoning under uncertainty and dinamicity, and provenance, but what about the challenge of scalability and performance? These issues appear briefly in the discussion but I think is worth to mention them already here. The authors have already included plenty of references for all the issues but not on the topic of scalability and performance (- see for example [3]).

Nonetheless, the rest of the paper attempts at answering the question of to what extend Semantic Web technologies address these previously identified challenges and what are the deficiencies, which is a topic worth of investigating and a survey like this one comes at a good time.

Some minor comments to improve readability :

In my opinion Section 2.2 does not add anything, it can be eliminated or merged with Section 2.2.1 which will make the paper more focused, which is important for such a long paper. Moreover, the challenges listed in Section 2.2.1 are in line with the ones mentioned in the introduction and followed through the rest of the paper (more or less), while the list in Section 2.2. is confusing and of little relevancy in my opinion. A better option is to merge both Section 2.2 with the Summary in 2.4., so the research themes go at the end of all the background knowledge applications, systems and type of data. Also, apart from scalability and performance it may be worth to briefly mention querying as one of the issues.

Section 3 starts with a very detailed description of description logics, reasoning services and OWL.
In Section 3.3. the authors state "Despite the rich primitives provided for expressing concepts, OWL DL has often proven insufficient to address the needs of practical applications", the rationale behind this statement is further argued but a reference here to where this has been proven would be good.

Section 3.4, many references are very briefly listed at the end (last two paragraphs) - e.g., [130][190][108][193][194][78] and their relevance to the discussion is not clear. It needs some rephrasing, either to extend on this, or make a reference to the part of the paper later on wherever this is described, or remove some references if they are not crucial.

Section 4, the authors list semantic enrichment and support for developing knowledge centric software as the two main functions of ontologies in pervasive systems, I would reformulate the later as support for integration. The title of Section 4 is also confusing as you are not just analysing systems but also ontology models. Table 3 is very helpful as a summary.

Section 5 introduction -> when you say "In the following we explore in detail .." it will help to have the subsection numbers here so the reader can directly go to the one she is interested on. I really like the analysis done in Section 5.2 in terms of time, location, person / agent and resource.

Section 5.3 is very long, maybe it will help to have the comparison table 4 at the beginning to aid following up the rationale and discussions behind the different subsections.

Section 5.4. -> I miss a summary table for this Section (like table 4 for Section 5.3). Most approaches provide domain specific solutions and it is worth emphasising the issue of how feasible is extending these approaches to much broader and dynamic heterogeneous sensor networks.

Section 5.7 -> trust models are not really analysed here (maybe change the title to Privacy and Provenance)

Table 5 is really good. To be consistent with subsection names rename "Modelling and representing context" to "Modelling and querying context". What about ranking on Service discovery?. Scalability is mention here but rarely through the paper, which (as I already mentioned) is an important open issue, in particular for reasoning.

I do agree with the authors on the relevance and discussion presented on Section 6, with respect to temporal features I will also add the need to establish temporal relations to correlate information from different sensors for querying.

Section 6.1. contains too much state of the art , at this stage we want to concentrate in the challenging issues rather than the specifics of related works. Of course some references are necessary to make the discussion stronger and validate it, but in general these section should be reduced and be more focused (like Section 6.2 and Section 6.3)

Section 6.3. -> I very much agree with the paragraph "Such techniques do not generalise well to semantically-enriched system with highly heterogeneous data sources …" but this applies to all dimensions, not just provenance.

[1]Alain Biem, Eric Bouillet, Hanhua Feng, Anand Ranganathan, Anton Riabov, Olivier Verscheure, Haris N. Koutsopoulos, Mahmood Rahmani, Baris Güç: Real-Time Traffic Information Management using Stream Computing. IEEE Data Eng. Bull. 33(2): 64-68 (2010).

[2] E . Bouillet, M. Feblowitz, Z. Liu, A. Ranganathan, A. Riabov, and F. Ye, A Semantics-Based Middleware for Utilizing Heterogeneous Sensor Networks. ;In Proceedings of DCOSS. 2007, 174-188.

[3] d'Aquin, M., Nikolov, A. and Motta, E. (2010) How much Semantic Data on Small Devices?, EKAW 2010 Conference - Knowledge Engineering and Knowledge Management by the Masses, Lisbon, Portugal