Characterizing Web of Things Interactions with Existential Reasoning

Tracking #: 2075-3288

Authors: 
Victor Charpenay
Sebastian Käbisch
Harald Kosch

Responsible editor: 
Guest Editors Sensors Observations 2018

Submission type: 
Full Paper
Abstract: 
The Web of Things (WoT) is a collection of interlinked Web resources exposed by autonomous sensors and actuators that interact to perform complex automation tasks. This paper presents a method to characterize interactions on the Web of Things in terms of relations between these devices and the physical world entities that compose their environment. In particular, based on the recent standardization by the W3C of the Thing Description (TD) model and its alignment with other Web ontologies, devices expose logical assertions on themselves that form a knowledge graph from which a `graph of interactions' can be derived. The reasoning task of interest in WoT is query answering over ontologies that feature existential restrictions on the `things' WoT devices observe or act upon. Because no complete algorithm exists for this task, we present a tractable skolemization algorithm for the ELP fragment of Description Logics (DLs), at the intersection of EL++ and Datalog. We tested our approach on two use cases in different industry domains: Building Automation (BA) and Industrial Control Systems (ICS).
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Aidan Hogan submitted on 30/Jan/2019
Suggestion:
Reject
Review Comment:

The paper proposes formal reasoning methods to describe interactions between Web of Things (WoT) resources, with a particular emphasis on the presence of existentials representing physical entities. The main technical contributions of the paper are as follows: (i) the definition of the problem of "semantic discovery", which involves taking a knowledge base K and a query Q describing a form of interaction, and outputting a graph composed of pairs of resources that are entailed by K to interact according to Q, (ii) a tractable approach for semantic discovery under ELP knowledge bases and conjunctive queries Q, based on a form of existential chase that creates fresh individuals to satisfy/replace existential constraints up to a fixed length over which standard query answering techniques can be applied; (iii) an implementation of these ideas and experiments in real-world settings.

The paper is clearly on topic for the journal. There is quite a lot of ongoing work on the Web of Things (WoT), where the paper thus addresses a timely topic. What I particularly like about the paper is the mix of theory and practice; it not only roots the problem of semantic discovery in terms of an existing formal framework (ELP), but further implements and experiments with these ideas in practice.

On the other hand, unfortunately, I find quite a few key problems in the paper; many regard the presentation, but some further regard the research contribution itself and its novelty.

1) First and foremost, in terms of presentation, the authors do not clarify early on the concrete problem that they wish to tackle. While the second last paragraph of the introduction gives a rough idea, it is too vague for the reader to understand basic questions such as: what are the types of interactions considered, why are existentials of particular importance to the problem of determining interactions, how will these WoT systems be specified, why is discovering these types of interactions useful/important/challenging, etc. Hence at the end of the introduction, the reader remains blind as to what the authors are aiming at for the next several pages. A problem statement is finally introduced on page 10, well past midway in the paper (and still, this problem statement does not clarify many of the questions mentioned previously). In summary, the reader is asked to continue reading the paper on the faith that it may lead somewhere interesting rather than being informed/motivated from the outset. A direct, concrete, motivating example at the end of the introduction, based on a concrete input and output that illustrate the technical challenge, would help immensely.

2) The paper presents some formal methods to characterise and address the problem of semantic discovery, but unfortunately, throughout the paper, there are confusing or imprecise claims, unclear or broken notation, etc. In the end, this becomes such a problem that (when combined with 1 and other problems discussed later) the paper unfortunately becomes quite frustrating to read. As some examples of these types of problematic claims, notation, etc.:

* "Reasoning is not a particular ... task but a tool to complete such tasks." What is meant here? There are reasoning tasks and there are reasoning tools. I don't think "reasoning" itself is either a task or a tool.

* "In computational logic, every reasoning task can be reduced to problem of satisfiability" This is not true for every logic, particularly when features like negation are not present. For example, the entailment task K |= K' can often be reduced to deciding satisfiability for K ^ ¬K', but this requires negation (¬). I think here the authors may be confusing satisfiability for entailment? (In any case, it's not clear that *every reasoning task* can be reduced to entailment, or what "every reasoning task" really means.)

* 'and all blank nodes in G are replaced by "fresh" IRIs' Here I think you want to say that this is how G_c is produced? Otherwise the definition remains incomplete (and to complete it, one already needs to understand what is being defined!).

* Example 1: _:r2 will not, by default, be removed by the colouring process. There are two forms of canonicalisation: one simply colours the blank nodes; the other first removes redundant blank nodes (leaning) and then colours them. The user can choose which to apply. Only in the second form is _:r2 removed.

* "RDF graph canonicalization is a fragile reasoning framework" I'm not sure I would call it a reasoning framework at all (arguably for deciding simple equivalence but the goal in general is not to perform “reasoning” in the traditional sense).

* Definition 1 is introduced as: "can now formally define a DL knowledge base". However, there are many flavours of DL, and in fact Definition 1 omits core features of DLs such as disjunction and universals, and is actually composed of rules, which is not a standard presentation for DLs. The authors should rather clarify that they are defining an ELP knowledge base (the section also feels presented a bit backwards, since first the DL flavour is chosen, and then other DL flavours are discussed; should be the other way around).

* Definition 1: the restriction that B be tree-shaped "if seen as an undirected graph" is vague since there's many ways to view it as an undirected graph, and likewise for the restriction that "there is no path from t t' to t in B", which again could be interpreted many ways.

* Definition 1: Adding C(x) -> \bot(x) "for all C in N^C and for all knowledge bases" means that all named classes must be empty in all knowledge bases.

* Definition 2 does not make much sense to me. In the first condition, sigma(t) in C^I, where did C and t come from? Similar questions arise from the second and third condition. More generally, the conditions never actually refer to the formula f.

* "In practice, it is common to answer a CQ by testing the entailment of a set of BCQs ..." I don't believe this to be true. Methods usually do something like forward chaining (e.g., chase) or backwards chaining (e.g., query rewriting). I have only ever seen this argument of being able to decide boolean queries for all possible solutions used in theory.

* "For fixed queries, it is called data complexity, ... it is called query complexity" What about taxonomic/ontological complexity?

* "since most queries are of much smaller size than databases" Though many papers make this claim, I don't see it as appropriate when such forms of reasoning are applied since in the case of DL-lite, for example, the query rewriting algorithm can create an exponentially-sized query; hence the notion of tractability in data complexity -- and in particular the idea that queries are often small in practice -- needs to be taken with a grain of salt once reasoning is applied.

* Definition 6: KB' is not defined (K').

* "In the case of equality, it means that there is ..." I failed to understand how the equality relation arises; I didn't understand this part at all.

* Table 2 I also failed to understand either formally or intuitively. Intuitively, for example, I don't understand how min cardinality has "no" for equality, when (for example) max-cardinality 1 on the top class can be used to define a functional property. Furthermore, I don't understand how transitivity is related to equality, but inverses not, for example. The table remains a puzzle to me.

* “From Table 2, the feature we used ... reflexive properties" Actually what is used is the Self construct, which is more general than reflexive properties, being reflexivity restricted to particular classes.

* "introducing an explicit symmetry relation between them would significantly increase the time complexity of query answering" This should be justified/explained in more detail.

* "In this particular case, A'_1 = A_2 and A'_2 = A_1" How so?

* Theorem 1: "emulates" should be formally defined.

* Theorem 1 proof: We start by observing that for all model[s] I of K, all path[s] (starting with a named individual and continuing through existentials) is at most of length n." The argument continues that property paths have a fixed length. But what about rules of the form Person(x) -> exists parent.Person(y)? Combined with Person(Alice), does this K not have infinite models? If not, why is this not part of the proof of the first claim?

3) The examples provided are also, unfortunately, confusing. As someone who is not expert on IoT/WoT topics, the specific domain is a bit abstract for me. Aside from this however, there were various technical aspects of the examples I did not understand. This issue, combined with (2) in particular, made the paper even more difficult to follow.

* Example 3: "body of water" -> "body of air"?

* Example 4: The knowledge-base is trivially satisfiable. There is nothing in the definition of a knowledge base (Definition 1) to prevent it from having a model aside from \bot. Hence the purpose of the example is confusing.

* Example 5: "For example, the following conjunctive query is indeed entailed by K_{ex}" Assuming that K_{ex} comes from Example 3, this does not appear to be true. Without any facts (only rules), there is nothing to suggest that a space exists or a temperature or something that has a property; we can have a model I of K_{ex} that assigns each body to false.

* Example 7: The head of the rule actsOnProperty(x_i,y) makes no sense to me; x_i does not exist, and y is a fluid not a property. I'm left to guess, maybe it should be actsOnProperty(x,z_2)?

* Example 8: containsZone(y,z) should be containsZone(x,z)

* Example 9: to illustrate a more general issue with these examples, this rule does not *intuitively speaking* make much sense to me. My suitcase is a physical body. I can have a phone within my suitcase that is solid. My suitcase can contain a bottle of water, which is liquid. Why should this entail that my book intersects with the water? (I understand of course that it's just an example, but a more intuitive example would aid not only to understand the paper better, but also potentially to motivate the work.)

* Example 10: HVAC not explained.

* Example 11: The fourth and fifth rules are complex and having spent the time to try to understand them, my best guess is that they are incorrect. Taking the fifth rule: within(x,y) ^ hasProperty(x,z) ^ Temperature(z) ^ exists observes.Temperature(y) -> observes(x,z). Now we can try the following substitution based on the data: within(s2,31.638) ^ hasProperty(s2,z) ^ Temperature(z) ^ exists observes.Temperature(31.638) -> observes(s2,z). But, as far as I can see, there is no substitution (neither named nor anonymous) for z here! Furthermore, if s1, the radiator is implied to have a temperature (this is not the case, but intuitively it would make sense), then we could entail that observes(s1,z), meaning that the radiator observes its own temperature? Furthermore, the existential makes little sense; my guess is that the idea is that if x acts on the temperature of a thing inside a room, it acts on the temperature of the room as well, but why this requires an existential, I don't understand. Combined with the fact that the other rules in the example are not true existentials since x is covered by the body (i.e., they are Self constraints; actually I am not sure this is intended since now x refers to, e.g., a radiator and its temperature?), this rule also doesn't motivate for me the need to deal with existentials. This example is crucial (also used in the evaluation), but for me, it confuses far more than it helps.

4) Algorithm 1 appears to be one of the main technical contributions but, though I am not an expert on existential rules, I don't understand how it differs from a standard chase procedure for such rules. The idea is to essentially create new individuals to satisfy existential formulae up to a fixed length, and then remove the existential formulae; this I understand to be a standard practice. Looking in more detail, the authors claim that "there is no dedicated algorithm for conjunctive query answering with existential restrictions", but even though I am not an expert, I am aware that such work exists in plentiful amount; a quick Google for "query answering existentials" reveals:

- "Goal-Driven Query Answering for Existential Rules with Equality", Benedikt et al., ‎2017
- "Ontology Based Query Answering with Existential Rules", Thomazo, 2013 (who has written several related papers on conjunctive query answering with existentials)
- "An Introduction to Ontology-Based Query Answering with Existential Rules", Mugnier & Thomazo, 2014
- "Tractable Query Answering for Expressive Ontologies and Existential Rules", Carral et al., 2017
- ...

Any of these papers seems to invalidate the aforementioned claim made by the authors. Distinguishing the authors’ algorithm from such approaches thus seems crucial.

5) The purpose of the experiments is not clear to me: it is not clear what research questions the experiments aim to address. Rather than addressing a concrete aspect of the proposal, such as performance, the authors apply their method to two real-world use-cases and then present the resulting interaction graph, arguing why it is interesting. But as a reader, I feel I have no way to judge whether these results are good or bad. The most concrete results are presented in Figure 8, but again, I find it hard to understand what we learn from these statistics.

In summary, I feel that the paper requires a lot more attention before it could be published: while perhaps some of the issues raised above could be addressed as part of a major revision, taking everything together, I think the paper would need to be substantially reworked before it would be acceptable for publication. Hence my verdict is a reject, and hope the authors may find these comments useful to improve their work.

## Minor comments (examples):
- Run a spell-check
- "the notion of +a+ blank node"
- ".e.g. precise" -> "e.g. be more precise"
- "foundations +of+ the"
- "epxressivity"
- "taken [for] the most part"
- "all _" -> "all _s" (various, e.g., "all model" -> "all models")
- "in +the+ introduction"
- "in the present" -> "in the presence"
- "an example of +a+ classification"
- "help express-ing- equality relations"
- "to +an+ RDF store"
- "scenarii"
- "self-organzing"
- "Each represent+s+"
- etc.
- Many references are missing important information, such as the journal/venue.

Review #2
Anonymous submitted on 12/Feb/2019
Suggestion:
Major Revision
Review Comment:

Summary
=======
The authors propose to use knowledge bases with existential reasoning to analyse Web of Things deployments. They motivate the need for existential reasoning by referring to (common sense) background knowledge about the physical environment of a Web of Things deployment, which can come in the form of existential rules. Hence, the authors propose a rule-based approach. The authors define a non-standard description logic for their purposes, based on the description logic EL++ augmented with description logic rules. Next, the authors survey related work in description logics, query answering, and abduction (noting that there is no algorithm in literature for their approach based on existential restrictions). The authors name query answering and abductive reasoning as required for analysing Web of Things deployments. Next, the authors provide a use-case for query answering on the Web of Things: discovery. They survey ontologies relevant for their use-case (sosa, ssn, om, bot, eclass, schema.org) and find that their description logic supports the features of the ontologies. Next, the authors present examples from an application scenario. Subsequently, the authors present a skolemisation-based algorithm for their approach based on existential restrictions, such that the output of their algorithm can be used in a triple store without reasoning capabilities for query answering. Next, the authors report on experiments in an Internet of Things scenario using a synthetic dataset and a simulation. In the former experiment, the authors show that the set-up in the synthetic dataset is quite resilient against node failures. In the latter experiment they show that the simulation set-up has few nodes that, upon failure, would stop the whole set-up from functioning. Last, the authors conclude and point to future work.

Verdict
=======
I suggest "major revision" as a verdict.
In particular, the authors should work on the quality of the writing, especially regarding structure of the paper and their argument. I think the work is indeed original. When working on the argument, the authors should also elaborate more on the significance of their work outside of the Web of Things.

Recommendations
===============
The main contribution of the paper is the skolemisation algorithm for query answering over a knowledge base that requires existential reasoning. Everything else could be arranged around the main contribution and could put into a more "standard" paper structure, which would make the paper much easier to follow: A problem statement in the introduction (instead of Section 3.2), one example for the paper in an own dedicated chapter (instead of mixing the example with Section 2 and 3). One section on the basic DL definitions as required for the approach. With the example and the DL definitions removed from Section 2, this section on related work would become more to the point.

Strong Points
=============
* I think the authors have a point that existentially quantified formulae are highly relevant on the Web of Things, as existential formulae could be indeed a way to encode background knowledge about the physical environment of a deployment
* The authors give a thorough theoretical treatment of their approach
* The authors gave insights into the value of their approach using real-world examples and datasets

Weak Points
===========
* The structure of the paper (see recommendations)
* The paper could use more standard DL terminology and syntax such that an introductory DL lecture would suffice to understand most of the main points, eg. introduce CBox vs. TBox and talk about concept inclusion vs. rules. The work on DL rules could be mentioned more prominently, as it seems foundational for the paper.
* While in the beginning, the authors add expressivity very sparingly, in the evaluation, the work loads are so small that the complexity introduced by too much expressivity does not really hurt.
* The term "to identify" is central to sections 1 to 3 and could deserve a definition. Is it identify as in URI? After page 8, the term does not reappear with the same meaning.
* The authors propose uRDF stores on the devices for maintaining Thing Descriptions. I am unsure if an RDF store is required in the deployment scenario where each device sends its Thing Description to the central Thing Directory, as the ability to send an ASCII string in an HTTP message would be sufficient in that case.
* The evaluation cases are less from a Thing's perspective as from an deployment analyst's perspective. If the analyst just collects the Thing Descriptions from the devices (ie. acts as an HTTP client), he/she can do the deployment analysis on arbitrary powerful infrastructure, so the small power of the devices and the sparingly added expressivity do not matter here too much.

Minor Points
============
* For notation in the examples, I would recommend to use prefixed URIs for the DL formulae, eg. "ssn:hasProperty" instead of "hasProperty"
* Indentation of text within examples seems often unnecessary
* What is achieved using the definition on p.4 left column line 35-37?
* "The RDF simple semantics" - maybe use a term from the RDF 1.1 MT document
* "Incremental updates of ABoxes" is beneficial as "servients enter a WoT system at different times". What about leaving, which is what happens in the evaluation?
* References in the bibliography section are given very heterogeneously

Typos
=====
* ".e.g." (p.3)
* "Krötsch" (p.4)
* "relation algebra" (p.6)
* "red area" (p.15) -> purple?
* "knowledge base intelligent systems" -> knowledge-based?

Review #3
Anonymous submitted on 09/Mar/2019
Suggestion:
Minor Revision
Review Comment:

Review of Manuscript SWJ 2075

This article describes a tractable method to perform existential reasoning to semantically discover implicit interactions between connected objects on the Web of Things.
The proposed method relies on the ELP description logic, an algorithm for Skolemization of a knowledge base, which is an original contribution of the article, on the deductive services offered by an off-the-shelf reasoner or deductive database, and on the query-answering services of a SPARQL engine.
Two experimental case studies demonstrate the viability of the proposed approach.

Overall, the paper is well-written, it presents original results, and the results appear to be significant, as demonstrated by the two case studies. However, its main weakness is formalization, which should definitely improved to meet the standards required for a journal publication.

Definition 1 is non a general definition of a DL expression, as claimed, but a specific definition of an expression in one particular DL, namely ELP, I think, even though the authors do not clearly indicate that.

Furthermore, on page 4, left column, line 36, the rule C(x) -> \bottom(x) is surprising: if, as the authors rightly state, \bottom is a sub-class of every class, then it should be rather \bottom(x) -> C(x), which is also consistent with the well-known result that from an inconsistency any formula can be deduced.

Definition 2 is obscure and, as far as I understand, not rigorous enough.
The first constraint for \sigma is ambiguous: which class does C indicate? How is class C related with the variable or individual name symbolized by t? What if t is in fact a variable x? How can one constrain a variable to be interpreted (or should I say substituted, since this appears to be the intended meaning of \sigma?) with an element of an arbitrary class?
The second constraint is particularly hard to digest. I will try to paraphrase it: it appears to say that the substitution of any variable or individual has k role slots, which are filled by at least one object in the interpretation and other n - k role slots, which are filled by the substitution of individuals or variables. What is k and what is n? Where do they come from? What does this constraint say that is not trivial? If all it says is that any individual / variable binding has a number of anonymous (= blank node) role fillers and a number of named role fillers, then I don't see why it should be stated. Properly speaking, it would not even be a "constraint", to speak of!

In Definition 6, the three conditions should be numbered, to allow referring to them in the following text. Furthermore, it is not clear what \mathcal{KB}' stands for. Is it a typo for \mathcal{K}'? That's confusing.

In any case, I think Section 2.3.2 on abductive reasoning could be safely dropped from the article without any negative consequence. On the contrary, since the authors chose to formulate their problem only in terms of query answering, I don't see the point of discussing abductive reasoning and the article would end up being more focused if that discussion was omitted.

The formalization of existence and equality, in page 8, left column, lines 12-19, is not convincing.

The authors claim that class complements may indirectly express equality, by negation, but they do not explain how this can be done. If one has class complement, one could express disjointness axioms of the form C(x) -> complement of D(x) and I would be able to deduce *inequality* of a and b if I know C(a) and D(b). However, I don't see how I could use complement to deduce *equality*!

In Section 3.3, it would be a good idea to formally define what "K' emulates K" means. My impression is that what the authors mean is just that K |= Q iff K' |= Q.

The first paragraph of the proof of Theorem 1 should be stated as a Lemma and proved separately. I think that would be more terse.

Bibliography: it is good to provide DOIs or URLs for all publications, but this should not be an excuse for omitting important details, like the name of the journal or the year of publication.

Typos:
"Formally." -> "Formally," (p. 4, line 44)
"an model" -> "a model" (p. 6, line 29)
"x_1" -> "x" (p. 9, line 51)
"We the proceed" -> "We then proceed" (p. 13, line 1)
"insofar that we assumes" -> "insofar as we assume" (p. 15, line 46)