Machine Learning in the Internet of Things: a Semantic-enhanced Approach

Tracking #: 1806-3019

Michele Ruta
Floriano Scioscia
Giuseppe Loseto
Agnese Pinto
Eugenio Di Sciascio

Responsible editor: 
Guest Editors IoT 2017

Submission type: 
Full Paper
Novel Internet of Things (IoT) applications and services rely more and more on an intelligent understanding of the environment from data gathered via heterogeneous sensors and micro-devices. Though increasingly effective, Machine Learning (ML) techniques generally do not go beyond classification of events with opaque labels, lacking meaningful representation and explanation of taxonomies. This paper proposes a framework for a semantic-enhanced data mining on sensor streams, amenable to resource-constrained pervasive contexts. It merges an ontology-based characterization of data distributions with non-standard reasoning for a fine-grained event detection by treating the typical classification problem of ML as a resource discovery. Outputs of classification are endowed with machine-understandable descriptions in standard Semantic Web languages, while explanation of matchmaking outcomes motivates confidence on results. A case study on road and traffic analysis has allowed to validate the proposal and achieve an assessment with respect to state-of-the-art ML algorithms.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 05/Feb/2018
Review Comment:

W3C SSN ontology became a W3C recommendation in October 2017, uodate page 5
CEP -> See ACEIS middleware intizar et al.
Page 6 type couldf -> could
Page 7 CAD tools?
Algorithm 1: clearly add input, output, parameters, check pseudo code guidelines?
Typo page 10 Contraction determines -> extra space?, same for Concept Contraction and Concept Abduction

Table 2 highlight in bold the number in the cells that are important to check

Review #2
By Simon Mayer submitted on 20/Feb/2018
Minor Revision
Review Comment:

I initially gave this paper a "Major Revision" decision, mainly due to the lack of information about the explainability of the system. The authors have addressed that issue, and added a very well-done Section 6.3 to the paper. The paper in general is better balanced now and readability has improved as well.

I have more issues, however (but increased my rating to "Minor Revision" because I believe that the contrasting of the ML approaches in the updated paper is a good contribution):

- The conclusions in 6.4 are too strong. In particular, I find the explainability of the decision tree, illustrated in Fig 9, better than that of the new system (at least for humans). At the same time, the decision tree is faster, and has higher precision and recall values. Why should I go for the new system in the discussed use case?

- Several parts of the paper raise readers' eyebrows. This should be smoothed out or improved:

--- "but their main weakness is in the lack of a structured and meaningful representation of detected events". I disagree.

--- "ii) their precision is increased if applied on very big data amounts, so making on-line analysis unfeasible." vs. "early research has shown state-of-the-art ML is effective
in the domain of ubiquitous sensor networks [34]". Isn't this a contradiction?

--- "MAFALDA exhibits a very low training time, making the approach suitable for on the fly data stream processing, while evaluation time is higher due to semantic matchmaking". Why? Classification happens more frequently over time than traning.

- Finally, while readability has improved, the article is not there yet in my opinion. There are obvious use-of-language flaws (e.g., "grow" vs. "grow up") that should be checked by an as-native-as-possible speaker.