Machine Learning in the Internet of Things: a Semantic-enhanced Approach

Tracking #: 1645-2857

Authors: 
Michele Ruta
Floriano Scioscia
Giuseppe Loseto
Agnese Pinto
Eugenio Di Sciascio

Responsible editor: 
Guest Editors IoT 2017

Submission type: 
Full Paper
Abstract: 
New Internet of Things (IoT) applications and services more and more rely on an intelligent understanding of the environment from data gathered via heterogeneous sensors and micro-devices. Though increasingly effective, Machine Learning (ML) techniques generally do not go beyond classification of events with opaque labels, lacking meaningful representations and explanations of taxonomies. This paper proposes a framework for a semantic-enhanced data mining on sensor streams, amenable to resource-constrained pervasive contexts. It merges an ontology-based characterization of data distributions with non-standard reasoning for a fine-grained event detection by treating the typical classification problem of ML as a resource discovery. Outputs of classification are endowed with machine-understandable descriptions in standard Semantic Web languages, while explanation of matchmaking outcomes motivates confidence on results. A case study on road and traffic analysis allowed to validate the proposal and achieve an assessment with respect to state-of-the-art ML algorithms.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Simon Mayer submitted on 04/Aug/2017
Suggestion:
Major Revision
Review Comment:

The authors address an important and timely problem - how to combine the strengths of ML and Semantics. In general, I like the authors' elegant approach of combining ML and semantics a lot and support the idea that this can allow to arrive at explanations for ML outputs. In addition to their diagnosis that ML lacks meaningful machine-understandable characterization of output data, I would add that another major drawback that could be alleviated with the help of Semantics is that ML solutions are usually very much tailored ("trained"...) to a specific problem.

However, I see major issues with the evaluation of the proposed approach: The authors' core motivation for this work is to demonstrate that their approach can enhance the explainability of ML outputs - however, this central claim is not tested in the evaluation - rather, the authors compare precision/recall of their approach with classical ML algorithms. In my opinion, this misses the point of evaluating their work, and an evaluation of the explainability is crucial to evaluate the core merit of this approach. Next, the precision/recall numbers seem to support the conclusion that the proposed system can compete with ML algorithms. However, the proposed approach should in addition be compared to current ANNs, as these suffer most from the issue of unexplanable output, and much less so than, e.g., KNNs which are often easily understandable intuitively for both humans and machines. In addition to the precision/recall performance, the authors should also report on the runtimes of the different algorithms - are there significant differences in the classification phase (I would expect that since the data in their approach is strongly enriched)? Finally, please elaborate more on how much of a (manual) effort it is to generate the owl-based training set.

Regarding the quality of writing, I find portions of the paper very hard to understand/follow (e.g., paragraph 1 of Section 2). A native speaker should go over the entire paper. Content-wise, I liked Section 1, but believe that Section 2 (in addition to being hard to read) and especially Section 3 are too long. In my opinion, there is no need to, for instance, discuss basic principles of all branches of ML if - as the authors state themselves - the paper applies mostly to Classification problems. In addition, for many of the papers mentioned in Section 3.3, it should be made clearer why and how they are related to the manuscript at hand.

I suggest a "Major Revision" since I believe that the authors' approach is very relevant for the research community in principle, but that an evaluation of the explainability of their system is crucial to verify their contribution. In addition, the language and style of the paper should be revised and some sections should be shortened.

Review #2
Anonymous submitted on 13/Oct/2017
Suggestion:
Major Revision
Review Comment:

Summary of the paper
The authors design an original framework called Mafalta (MAtchmarking Features for mAchine Learning Data Analysis) to run ML techniques on IoT data streams using the benefits of semantic web technologies for adding metadata instead of trivial classification labels. The road and traffic monitoring use case is provided to demonstrate the usefulness of the framework. This use case improves the functionality of navigation systems with real-time driver assistance. The weka machine learning tool for Java has been used to test the framework with a real dataset collected for experiments. The goal of the system is to detect type of roads (even, slightly uneven or uneven), type of traffic (low, medium, high) and driving style (aggressive, even pace). The dataset comprises altitude change, speed, longitudinal and vertical acceleration, engine load and engine coolant temperature, etc.)
An evaluation has been done to measure the processing time to load ontologies, for data mapping, etc. and evaluated on various devices (smartphone, raspberry, etc.) to demonstrate that the proposed approach with semantic web technologies is faster than just with machine learning.

Strengths of the paper:
• Original work since it covers three domains: semantic web, iot, machine learning
• background section provided since it covers 3 domains: semantic web, iot, machine learning
• Prototype implemented
• Code with ontologies and dataset available ion Github
o https://github.com/sisinflab-swot/mafalda
• Pervasive computing mentioned. Indeed in IoT, frequently previous similar research field is neglected.
• Well structured and well-written.

Weaknesses of the paper:
• The ontology is online but could be improved with tools for automatic visualization, documentation, ontology validation, etc. (see [9] [10])
• Does the ontology reuses or is aligned with IoT ontologies or transport ontologies? Not explained in the paper. When looking at the code it does seem to use the SSN ontology, etc.
o Check ontologies in transportation on ontology catalogues (e.g., LOV4IoT [13], OpenSensingCity [14], Ready4SmartCities [15], LOV [16]). We encourage the alignment of common concepts and properties when possible.
• Ontology labels and comments are missing something really important for automatic ontology matching and documentation for instance.
• the related work section really lacks of important references, see literature recommendations.
• The prototype code is available on the web but could be improved by following linked open data trends and linked open vocabularies trend

Additional comments - Section Introduction:
• “ontology-driven resource discovery”-> not clear enough
• “distributed knowledge-based systems [28]” -> this concept is not clearly explained in the paper referenced.

Additional comments - Section Motivation:
“Semantic Web of Things (SWoT) [28]”-> the authors are citing themselves but other important references should be included. For instance, SPITFIRE project [3] has been published earlier, see also Jara et al. [4], Gyrard et al. [12], Wu et al. [7], etc.
We encourage the authors to search on web browsers the most important references with this “Semantic Web of Things (SWoT)” keyphrase.
Additional comments - Background:
In section 3.3 it would be nice to have a conclusion to explain better the limitations and how the proposed work will cover some of the limitations.
What would be the difference with Complex Event Processing (CEP). It has been introduced in the background section some work mixing semantic web + CEP but more explanations are needed.
Additional comments - Section Case study:
“the system should detect the following classes” -> such events are relevant for other project, following linked data philosophy how such events could be shared (e.g., see reference)?

Additional comments - Section experiments:
Explain better what is the result with the ontology based machine learning approach compared to other approaches. highlights this sentence “this is a significant outcome because it suggests … multiple features” perhaps in bold, perhaps into a separated “conclusion” paragraph within this section.
The name given Mafalda could be introduced at the beginning of the paper. The first explanation about it is in the section 6 Experiments. + typo issue Analisys -> Analysis

The main benefit of using semantic web technologies is to get meaningful information from data, but the main drawbacks is that it requires more processing time. This paper demonstrates that not necessarily.

Literature recommendations:
• IoT + Semantic Web + Machine Learning:
o Moraru et al. [1] [2]
o Zhang et al. (pollution detection from vehicles and traffic pattern detection) [5]
o Henson et al., IntelligO [6]
o Wu et al. SWOTWCPS [7]
• Semantic Web + Data mining Survey [8]
[1] Master's thesis: Enrichment of sensor descriptions and measurements using semantic technologies [Moraru et al. June 2011]
[2] Using machine learning on sensor data [Moraru et al. 2010]
[3] SPITFIRE: Towards a Semantic Web of Things [Pfisterer et al. 2011]
[4] Semantic web of things: an analysis of the application semantics for the iot moving towards the iot convergence [Jara et al. 2014]
[5] Semantic framework of internet of things for smart cities: case studies [Zhang et al. 2016]
[6] PhD Thesis: A semantics-based approach to machine perception [Henson et al. 2013]
[7] Towards a Semantic Web of Things: A Hybrid Semantic Annotation, Extraction, and Reasoning Framework for Cyber-Physical System [Wu et al. 2017]
[8] Semantic Web in data mining and knowledge discovery: A comprehensive survey [Ristoski et al. 2016]
[9] http://perfectsemanticweb.appspot.com/?p=ontologyValidation
[10] Semantic Web Methodologies, Best Practices and Ontology Engineering Applied to Internet of Things [Gyrard et al. 2015]
[11] Sensor-based Linked Open Rules (S-LOR): An Automated Rule Discovery Approach for IoT Applications and its use in Smart Cities [Gyrard et al. 2017]
[12] Semantic Web of Things: http://sensormeasurement.appspot.com/
[13] LOV4IoT ontology catalogue: http://sensormeasurement.appspot.com/?p=ontologies
[14] OpenSensingCity ontology catalogue : http://ci.emse.fr/opensensingcity/ns/ontologies/
[15] Ready4SmartCity ontology catalogue : http://smartcity.linkeddata.es/
[16] LOV ontology catalogue: http://lov.okfn.org/dataset/lov/vocabs