Review Comment:
The paper proposes formal reasoning methods to describe interactions between Web of Things (WoT) resources, with a particular emphasis on the presence of existentials representing physical entities. The main technical contributions of the paper are as follows: (i) the definition of the problem of "semantic discovery", which involves taking a knowledge base K and a query Q describing a form of interaction, and outputting a graph composed of pairs of resources that are entailed by K to interact according to Q, (ii) a tractable approach for semantic discovery under ELP knowledge bases and conjunctive queries Q, based on a form of existential chase that creates fresh individuals to satisfy/replace existential constraints up to a fixed length over which standard query answering techniques can be applied; (iii) an implementation of these ideas and experiments in real-world settings.
The paper is clearly on topic for the journal. There is quite a lot of ongoing work on the Web of Things (WoT), where the paper thus addresses a timely topic. What I particularly like about the paper is the mix of theory and practice; it not only roots the problem of semantic discovery in terms of an existing formal framework (ELP), but further implements and experiments with these ideas in practice.
On the other hand, unfortunately, I find quite a few key problems in the paper; many regard the presentation, but some further regard the research contribution itself and its novelty.
1) First and foremost, in terms of presentation, the authors do not clarify early on the concrete problem that they wish to tackle. While the second last paragraph of the introduction gives a rough idea, it is too vague for the reader to understand basic questions such as: what are the types of interactions considered, why are existentials of particular importance to the problem of determining interactions, how will these WoT systems be specified, why is discovering these types of interactions useful/important/challenging, etc. Hence at the end of the introduction, the reader remains blind as to what the authors are aiming at for the next several pages. A problem statement is finally introduced on page 10, well past midway in the paper (and still, this problem statement does not clarify many of the questions mentioned previously). In summary, the reader is asked to continue reading the paper on the faith that it may lead somewhere interesting rather than being informed/motivated from the outset. A direct, concrete, motivating example at the end of the introduction, based on a concrete input and output that illustrate the technical challenge, would help immensely.
2) The paper presents some formal methods to characterise and address the problem of semantic discovery, but unfortunately, throughout the paper, there are confusing or imprecise claims, unclear or broken notation, etc. In the end, this becomes such a problem that (when combined with 1 and other problems discussed later) the paper unfortunately becomes quite frustrating to read. As some examples of these types of problematic claims, notation, etc.:
* "Reasoning is not a particular ... task but a tool to complete such tasks." What is meant here? There are reasoning tasks and there are reasoning tools. I don't think "reasoning" itself is either a task or a tool.
* "In computational logic, every reasoning task can be reduced to problem of satisfiability" This is not true for every logic, particularly when features like negation are not present. For example, the entailment task K |= K' can often be reduced to deciding satisfiability for K ^ ¬K', but this requires negation (¬). I think here the authors may be confusing satisfiability for entailment? (In any case, it's not clear that *every reasoning task* can be reduced to entailment, or what "every reasoning task" really means.)
* 'and all blank nodes in G are replaced by "fresh" IRIs' Here I think you want to say that this is how G_c is produced? Otherwise the definition remains incomplete (and to complete it, one already needs to understand what is being defined!).
* Example 1: _:r2 will not, by default, be removed by the colouring process. There are two forms of canonicalisation: one simply colours the blank nodes; the other first removes redundant blank nodes (leaning) and then colours them. The user can choose which to apply. Only in the second form is _:r2 removed.
* "RDF graph canonicalization is a fragile reasoning framework" I'm not sure I would call it a reasoning framework at all (arguably for deciding simple equivalence but the goal in general is not to perform “reasoning” in the traditional sense).
* Definition 1 is introduced as: "can now formally define a DL knowledge base". However, there are many flavours of DL, and in fact Definition 1 omits core features of DLs such as disjunction and universals, and is actually composed of rules, which is not a standard presentation for DLs. The authors should rather clarify that they are defining an ELP knowledge base (the section also feels presented a bit backwards, since first the DL flavour is chosen, and then other DL flavours are discussed; should be the other way around).
* Definition 1: the restriction that B be tree-shaped "if seen as an undirected graph" is vague since there's many ways to view it as an undirected graph, and likewise for the restriction that "there is no path from t t' to t in B", which again could be interpreted many ways.
* Definition 1: Adding C(x) -> \bot(x) "for all C in N^C and for all knowledge bases" means that all named classes must be empty in all knowledge bases.
* Definition 2 does not make much sense to me. In the first condition, sigma(t) in C^I, where did C and t come from? Similar questions arise from the second and third condition. More generally, the conditions never actually refer to the formula f.
* "In practice, it is common to answer a CQ by testing the entailment of a set of BCQs ..." I don't believe this to be true. Methods usually do something like forward chaining (e.g., chase) or backwards chaining (e.g., query rewriting). I have only ever seen this argument of being able to decide boolean queries for all possible solutions used in theory.
* "For fixed queries, it is called data complexity, ... it is called query complexity" What about taxonomic/ontological complexity?
* "since most queries are of much smaller size than databases" Though many papers make this claim, I don't see it as appropriate when such forms of reasoning are applied since in the case of DL-lite, for example, the query rewriting algorithm can create an exponentially-sized query; hence the notion of tractability in data complexity -- and in particular the idea that queries are often small in practice -- needs to be taken with a grain of salt once reasoning is applied.
* Definition 6: KB' is not defined (K').
* "In the case of equality, it means that there is ..." I failed to understand how the equality relation arises; I didn't understand this part at all.
* Table 2 I also failed to understand either formally or intuitively. Intuitively, for example, I don't understand how min cardinality has "no" for equality, when (for example) max-cardinality 1 on the top class can be used to define a functional property. Furthermore, I don't understand how transitivity is related to equality, but inverses not, for example. The table remains a puzzle to me.
* “From Table 2, the feature we used ... reflexive properties" Actually what is used is the Self construct, which is more general than reflexive properties, being reflexivity restricted to particular classes.
* "introducing an explicit symmetry relation between them would significantly increase the time complexity of query answering" This should be justified/explained in more detail.
* "In this particular case, A'_1 = A_2 and A'_2 = A_1" How so?
* Theorem 1: "emulates" should be formally defined.
* Theorem 1 proof: We start by observing that for all model[s] I of K, all path[s] (starting with a named individual and continuing through existentials) is at most of length n." The argument continues that property paths have a fixed length. But what about rules of the form Person(x) -> exists parent.Person(y)? Combined with Person(Alice), does this K not have infinite models? If not, why is this not part of the proof of the first claim?
3) The examples provided are also, unfortunately, confusing. As someone who is not expert on IoT/WoT topics, the specific domain is a bit abstract for me. Aside from this however, there were various technical aspects of the examples I did not understand. This issue, combined with (2) in particular, made the paper even more difficult to follow.
* Example 3: "body of water" -> "body of air"?
* Example 4: The knowledge-base is trivially satisfiable. There is nothing in the definition of a knowledge base (Definition 1) to prevent it from having a model aside from \bot. Hence the purpose of the example is confusing.
* Example 5: "For example, the following conjunctive query is indeed entailed by K_{ex}" Assuming that K_{ex} comes from Example 3, this does not appear to be true. Without any facts (only rules), there is nothing to suggest that a space exists or a temperature or something that has a property; we can have a model I of K_{ex} that assigns each body to false.
* Example 7: The head of the rule actsOnProperty(x_i,y) makes no sense to me; x_i does not exist, and y is a fluid not a property. I'm left to guess, maybe it should be actsOnProperty(x,z_2)?
* Example 8: containsZone(y,z) should be containsZone(x,z)
* Example 9: to illustrate a more general issue with these examples, this rule does not *intuitively speaking* make much sense to me. My suitcase is a physical body. I can have a phone within my suitcase that is solid. My suitcase can contain a bottle of water, which is liquid. Why should this entail that my book intersects with the water? (I understand of course that it's just an example, but a more intuitive example would aid not only to understand the paper better, but also potentially to motivate the work.)
* Example 10: HVAC not explained.
* Example 11: The fourth and fifth rules are complex and having spent the time to try to understand them, my best guess is that they are incorrect. Taking the fifth rule: within(x,y) ^ hasProperty(x,z) ^ Temperature(z) ^ exists observes.Temperature(y) -> observes(x,z). Now we can try the following substitution based on the data: within(s2,31.638) ^ hasProperty(s2,z) ^ Temperature(z) ^ exists observes.Temperature(31.638) -> observes(s2,z). But, as far as I can see, there is no substitution (neither named nor anonymous) for z here! Furthermore, if s1, the radiator is implied to have a temperature (this is not the case, but intuitively it would make sense), then we could entail that observes(s1,z), meaning that the radiator observes its own temperature? Furthermore, the existential makes little sense; my guess is that the idea is that if x acts on the temperature of a thing inside a room, it acts on the temperature of the room as well, but why this requires an existential, I don't understand. Combined with the fact that the other rules in the example are not true existentials since x is covered by the body (i.e., they are Self constraints; actually I am not sure this is intended since now x refers to, e.g., a radiator and its temperature?), this rule also doesn't motivate for me the need to deal with existentials. This example is crucial (also used in the evaluation), but for me, it confuses far more than it helps.
4) Algorithm 1 appears to be one of the main technical contributions but, though I am not an expert on existential rules, I don't understand how it differs from a standard chase procedure for such rules. The idea is to essentially create new individuals to satisfy existential formulae up to a fixed length, and then remove the existential formulae; this I understand to be a standard practice. Looking in more detail, the authors claim that "there is no dedicated algorithm for conjunctive query answering with existential restrictions", but even though I am not an expert, I am aware that such work exists in plentiful amount; a quick Google for "query answering existentials" reveals:
- "Goal-Driven Query Answering for Existential Rules with Equality", Benedikt et al., 2017
- "Ontology Based Query Answering with Existential Rules", Thomazo, 2013 (who has written several related papers on conjunctive query answering with existentials)
- "An Introduction to Ontology-Based Query Answering with Existential Rules", Mugnier & Thomazo, 2014
- "Tractable Query Answering for Expressive Ontologies and Existential Rules", Carral et al., 2017
- ...
Any of these papers seems to invalidate the aforementioned claim made by the authors. Distinguishing the authors’ algorithm from such approaches thus seems crucial.
5) The purpose of the experiments is not clear to me: it is not clear what research questions the experiments aim to address. Rather than addressing a concrete aspect of the proposal, such as performance, the authors apply their method to two real-world use-cases and then present the resulting interaction graph, arguing why it is interesting. But as a reader, I feel I have no way to judge whether these results are good or bad. The most concrete results are presented in Figure 8, but again, I find it hard to understand what we learn from these statistics.
In summary, I feel that the paper requires a lot more attention before it could be published: while perhaps some of the issues raised above could be addressed as part of a major revision, taking everything together, I think the paper would need to be substantially reworked before it would be acceptable for publication. Hence my verdict is a reject, and hope the authors may find these comments useful to improve their work.
## Minor comments (examples):
- Run a spell-check
- "the notion of +a+ blank node"
- ".e.g. precise" -> "e.g. be more precise"
- "foundations +of+ the"
- "epxressivity"
- "taken [for] the most part"
- "all _" -> "all _s" (various, e.g., "all model" -> "all models")
- "in +the+ introduction"
- "in the present" -> "in the presence"
- "an example of +a+ classification"
- "help express-ing- equality relations"
- "to +an+ RDF store"
- "scenarii"
- "self-organzing"
- "Each represent+s+"
- etc.
- Many references are missing important information, such as the journal/venue.
|