Review Comment:
This survey paper presents an overview of the application of semantic technologies to sensor-driven systems. It analyses mainly 6 challenges and identifies Semantic Web technologies that address those challenges. Then it observes remaining open issues to be tackled.
This paper is ambitious in the sense that sensor systems may benefit from many SW technologies (from query processors to reasoners and models), and thus the authors cover an immense range of works. This is somewhat dangerous as each of the 6 challenges are vast and for each one an individual survey would be needed to fully describe it (e.g. a whole survey on stream reasoning would be worth publishing). Therefore the reader is left with the impression that this paper is too superficial in the analysis and comparison of the different described approaches. Nevertheless, some sections are more detailed than others (e.g. Event modeling), and this means that the authors have not used the same methodology to evaluate or analyze each of the 6 different challenges.
Perhaps it would be necessary to focus only on some of these issues, in order to provide a complete and detailed survey. Narrowing the scope does not mean to make this work become 'incomplete'. On the contrary, it would help the authors to provide a deeper analysis of this area.
Another issue is that one of the challenges (in fact the 7th one in Section 2 is not really analyzed (only a brief unsubstantial mention in the final comments of 4.11), which is surprising, given that this is a key issue for sensor systems with Big Data problems including very high velocity, and (especially for semantic-aware systems) high variety.
Nonetheless, when the authors take time to perform a summarization and analysis, the results do represent a contribution, such as the summary on Table 2. In fact I would recommend the use of tables such as this throughout the paper to show how the different approaches match the 'added value' represented here, and provide more insightful comments on how the authors reached these conclusions. (for instance 'scalability' is identified as one of the 'further research inquiries' on most cases, but this is not evident in the rest of the subsections of Section 4).
Apart from these important issues, in general the paper is well written and represents an improvement over the previous version, especially in terms of the surveyed works and most notably the coherence of the paper terminology, and structure.
Detailed comments below.
Abstract
======
After reading the abstract it is expected to see:
1 application of SW to sensor driven systems
2 strengths and weaknesses of approaches
3 propose a roadmap
After reading the paper point 2 is not really clear. In many cases the paper provides only a description but not a complete analysis and comparison of weaknesses and strengths.
Introduction
=========
What is the methodology used to perform this survey? What type of works are eligible? While there are many sensor-driven systems and approaches using semantic web technologies, it is not absolutely clear at this point what are the dimensions you will use to compare them.
From the introduction, it seems that only Section 4 is devoted to really provide a survey of the different approaches.
Applications, Information and Research Challenges
========================================
This section is too broad in scope. The research challenges or key issues to be analyzed in the paper should be clear even from the introduction, in order to guide the reader and define what this paper is going to analyze in each approach or system.
The application examples (2.1) provided are too detailed and I don't think it is too relevant material for the rest of the discussion. For this type of survey paper, it is more important to focus on the research challenges, and these applications descriptions are secondary and should be shortened.
The information in pervasive computing is missing a very important fact. This is that the raw sensor data substantially differs from other traditional types of data in that it is intrinsically dynamic, and represented (usually) as data streams. This fact brings many of the challenges wrt sensor data, because it implies the need for continuous processing, management of data bursts, real-time evaluation etc. (which is one of the challenges identified by the authors).
The research challenges identified are:
- Conceptual modeling
- Querying
- Reasoning
- Uncertainty
- Service discovery
- Privacy and Provenance
- Scalability and Performance
It remains unclear even after this point if this survey will analyze existing proposals taking into account all these dimensions. Therefore the methodology of this survey is not clear.
Semantic Web Technologies
=======================
For the audience of the SWJ this section is most likely not necessary, or at least not as a full section. The details on DL, OWL, OWL2 do not add much to the whole discussion, and all these details are not necessary to follow the rest of the paper. A 'background' section in this survey can certainly introduce SW technologies (briefly), but also what sensor-driven systems provide, without SW technologies. This automatically would induce the reader to think why do we need SW technologies in these systems?
Integrating Semantic Web with Pervasive Systems
=======================================
This long section (half paper) is the main contribution (the survey itself). However the different sections are very unbalanced in the way they present the different approaches. For instance in 4.3 there is a very complete description of event models and a comparison of them according to well defined criteria. In the rest of the sections the approaches, models and systems are described with much less detail and there is no systematic comparison according to well-defined criteria. For most of the subsections, we are left with mostly a brief description of an approach and little or no comparison among them (which would be expected in a high quality survey). For instance 4.4 doesn't even have an analysis subsection at all.
Also, we would expect that the challenges presented in the previous section would also be addressed here. It is surprising that scalability is nowhere to be found (except for a brief mention in 4.11 but it cites an evaluation using Jena and related tools, which is clearly not enough to seriously speak about performance in this context). The issues on scalability lay far beyond just simple evaluations of OWL and RDF processing libraries, how about rapidly changing observation values coming from sensors? continuous complex event processing? callout with parallel processing of continuous streams of sensor data? the scalability issues are unacceptably disregarded in this survey.
The reader is left with the impression that the different subsections of Section 4 have been written with very different methodologies, in some cases providing mostly brief descriptions and in other cases full comparison and deep analysis. This should be uniformed.
While the approaches in 4.1.1 are worth mentioning, they are syntactic representations (well mentioned by the authors) and therefore I don't see why there are so many details about them. For instance the model in FIg1 is too simplistic compared to other relevant ontologies such as SSN (which is described later). A better balance is needed on the level of details provided in this section.
The analysis in 4.1.3 lacks a discussion about reusability, which is a key point for sensor ontology modeling. Ontologies such as SSN can be combined (and must be combined) with other domain ontologies, temporal ontologies, etc in order to provide a full model covering all aspects of a pervasive application. This is only possible if the models are designed to be extensible and that is crucial in these ontologies.
The section in 4.2 describes different approaches for representing context (time,location, etc), and correctly point out that in most cases the need for integrating existing models. However I am missing simple ontology models for geo-location (wgs84 owl, genomes ontology, neo-geo ontology) which are commonly used for sensor systems.
Section 4.3 is comprehensive and the different approaches are compared with well-defined dimensions.
Section 4.4 is just a description of approaches and lacks analysis and discussion.
This section could probably be merged with 4.5, as CEP is also closely related to querying, but at a higher level of abstraction.
4.5 correctly points out that most of the SPARQL based query extensions focus more on the temporal aspects. I don't understand the final comment on 4.5.1, when it says there are no widely spread standard models. The models in section 4.1, e.g SSN can be used here and are gaining adoption.
There is not even a broad comparison of the presented approaches, how do they compare to each other, what is missing on them, are they useful for sensor-driven systems or to what extent?
4.6 brings forward reasoning techniques including rule-based and hybrid approaches. However I do not see why this section is not merged with 4.7, which is also about reasoning but focusing on the rapidly changing nature of sensor streaming data. It would be advisable to have a more structured view of reasoning for sensor systems including all these possibilities, and explain the scope of each one and how these are combined in practice. Otherwise 4.7 is just an enumeration of some approaches with little relationship with the rest of the paper.
in 4.8 and 4.9 we are missing again informative at best but lack a deeper analysis and comparison. In the case of uncertainty this can understandable to some extent because it seems to be not too developed according to the authors' account.
In 4.10 the term provenance seems to be misleading. Provenance usually refers to where the data comes from, who originated it , how it was derived from previous data. In this sense, existing models such as PROV, PROV-O family of specifications of W3C are good examples, and are widely used by a growing community. The authors do not touch this in this section, and they probably should, but they go back to it in 5.3. Probably some of the analysis of 5.3 would fit better here in order to provide a better account of what is being done int rems of provenance.
5 Challenging Issues
=================
5.1 I agree on most of the issues about interval modeling and need for standards in temporal modeling in rdf and sparql, this is definitely part of the current challenges.
The issues on 5.2 are tackled by complex event processors and stream processing engines and their semantic derivatives, in most cases. It is true that for stream reasoning it is still a topic that is still in its infancy.
|