Review Comment:
This paper presents a new vision for so-called cascading stream reasoning. The main idea of the architecture is to use different tools at different architectural layers so that different types of data can be handled with formalisms of appropriate expressivity. The authors also present an instantiation of their vision by extending the MASSIF stream reasoning system with two new components: a selection module, and an event processing module.
While the topic of the paper is highly relevant, I have numerous problems with the material presented in the paper. In my opinion, all of these problems combined make the paper not suitable for publication in its present form. The problems can be summarised as follows.
* I fail to see the value of the abstract vision of cascading reasoning. In fact, I found the surrounding discussion very confusing and without proper motivation. Also, I do not see how this benefits other practitioners in the field.
* The motivation for such a complex vision is unclear. In particular, instead of a layered architecture incorporating five different formalisms, I wonder whether the motivating example (and similar problems) could be solved using just one system and streaming processing language that can support querying, reasoning, and event detection.
* The technical material seems rather light-weight and has been presented poorly, and I question some technical choices made. The role of various attempts at formalising different languages is unclear, and the motivation behind them is unclear as well. As a consequence, the results are not reproducible (i.e., the paper does not contain sufficient information necessary to actually build different components of this system). Therefore, I do not see how the presented technical concepts could be of interest to other practitioners in the field.
* The motivation behind the evaluation is unclear, and it in fact shows an important weakness of unclear focus and contribution of the presented work.
* The quality of English is rather poor in places.
I discuss these issues in detail in the rest of my review.
1. The value of a cascading stream reasoning vision
---------------------------------------------------
I do not see what readers of this paper stand to gain from reading about the cascading stream reasoning vision: it is essentially just a picture that, in my opinion, provides very little guidance to implementors of stream reasoning systems. The rationale based on which the framework has been derived was presented in Section 4, but I could not follow it. For example, one "contribution" of the revised vision is replacing the DL and DLP layers from the original vision with a new Inference layer, but this is hardly an important insight: both DL and DLP are inference formalisms, and not a big leap of imagination is needed to support other formalisms as well. I also do not see how Figure 2 presents "the trade-off between expressiveness and rate of changes in the data": the fact that a logic is expressive and of high complexity does not necessarily mean that it cannot successfully handle high change rates. Such a trade-off should be demonstrated either theoretically or empirically.
The discussion in Section 4 is also very high-level. Moreover, the authors use terminology imprecisely: terms such as "descriptive analytics aspects" or "common semantic space" have not been defined. What does "populate a conceptual model" mean? Note that it could equally mean "add classes and/or properties to the model" or "add data to the elements already in the model".
Finally, no clear rationale for the vision design has been presented. I would expect that the design would be driven by clear qualitative or perhaps even quantitative requirements. Instead, this section seems to just present an authors' opinion that has not been substantiated.
I believe that the paper would be better if it focussed on the description of the design of the extension of MASSIF, without any high-level philosophical discussions. This might make the paper more focused and more technical.
2. Complexity of the solution
-----------------------------
As far as I can see, the proposed solution requires an integration of at least five languages: RSP-QL for querying streams, DatalogMTL, CEP, DLs, and DSL. This leaves me wondering whether such complexity is actually needed: I just do not see what each of these languages contributes to the entire picture. In fact, the entire vision seems very complex and difficult to understand.
I can imagine that one system supporting just one language capable of performing all of these tasks would be much easier to use. This language should clearly have the expressivity needed to support all different tasks, and I wonder whether temporal datalog could be used for this purpose. It should be possible to express the time-based windows in temporal datalog, and the authors also show that CEP operators can be expressed as well. This leaves DL reasoning, but I also wonder whether expressive DL constructs, such as existential quantifiers, are really needed: for example, for the purposes of data analysis, in the definition of HighTraficMainToadNearFlexibleOffice, the <= direction of the definition seems to be what is needed, and that direction can be expressed in datalog without problems.
This would allow all processing to be done in a single framework, which would clearly be much simpler for users: they would just write the specification of their analysis in this one language and would not have to keep switching between various formalisms. Also, system implementation might be easier as one would need just one engine.
To address any scalability concerns, one could implement special processors for different fragments of the language. For example, if a part of a datalog program essentially implements the same functionality as RSP-QL, this program could surely be evaluated using similar techniques to what is found in an RSP-QL engine.
As a side-note, I wonder why DLs are needed in this system at all. The quantitative analysis of the form used in the examples is not really a strong point of description logics.
3. Presentation issues
----------------------
My key criticism regarding very poor presentation: many definitions are incomplete and unclear, the material is disorganised, and the level of detail is insufficient for readers to reproduce the presented results. As a consequence, I wonder what the take-home message of this paper really is. I will next point our many such specific problems.
An overview of the CEP language seems interesting enough, but the semantics has not been explained with sufficient detail. First, the authors do not say what the indexes of A and B in Figure mean. Moreover, the authors say that A AND B matches in both streams as t2, but they never explain whether the operator ever stops matching. I have analogous questions for the remaining operators. As an aside, it would be good to explain more clearly why high traffic cannot be defined in CEP: is this because there are no primitives for the quantitative manipulation of data?
Definitions 3.1 -- 3.6 introduce a bunch of concepts that, as far as I can see, serve only to describe the syntax of RSP-QL. They are never used in the rest of the paper, and also the semantics of RSP-QL has not been specified. The notion of "RSP-QL algebraic expressions" has not been defined. Because of all that, I really do not see what is to be learned from all these definitions.
Similar comments apply to the definition of DatalogMTL: page 6 contains a bunch of formulas, but with just very high-level description. No formal semantics has been presented. I am not saying that a formal semantics should have been included; my main point is that presenting these definitions seems quite arbitrary to me.
Definitions in Section 5 are insufficiently precise to be called "definitions" in a formal sense. For example, Definition 5.1 defines the notion of a "physical event" in terms of an "event", but an "event" has never been formally defined. I was also confused by the relationship between E_phy and e_i in Definition 5.2: as far as I can see, e_i should be ontology individuals, and Definition 5.1 says that E_phy contains events; but then, this means that events are actually ontology individuals, which has never been explained. I was lost in the notation of Definition 5.2. What is a "complex event type" in Definition 5.4?
Section 5.2 contains a bunch of formulas that completely confused me. What exactly is a stream formally? (This has never been defined.)
I was also confused by Figure 6.a: apart from the fact that the text is typeset very poorly (there are strange spaces between various names), I really do not understand what I am supposed to learn from it.
I was also confused by the role of DatalogMTL in the paper, for two reasons. First, from the text I got the impression that the events are to be described using DatalogMTL, but even specifications are then translated into CEP from page 10. Note, however, that the table on page 10 presents an opposite translation -- from CEP into DatalogMTL. Hence, this translation could be used if we have a DatalogMTL engine and wanted to use it to evaluate CEP operators; however, from the paper I understood that the goal was exactly the opposite, which completely confused me. Second, if there is a close translation between CEP and DatalogMTL, why bother with using both languages in this approach? Would it not be simpler if everything were defined just in CEP, so we can just forget about DatalogMTL?
4. Problems with evaluation
---------------------------
The goal of the evaluation was not clear to me, and I believe this to be indicative of the general lack of focus in this paper. Was the goal mainly to conduct a feasibility study? Or was to goal to attain certain scalability criteria? None of this has been stated in the paper explicitly, and therefore the meaning of the performance graphs is unclear to me. In fact, I do not even know how to interpret these numbers: for example, is processing 600 events in a window of 100 s (top-left part of Figure 8) good or bad performance?
If the objective of the paper was to make a performance claim, then a much more extensive comparison with existing approaches would be needed. Unfortunately, the only such comparison consists of one paragraph in Section 7.3.
5. Quality of English
---------------------
The paper contains many grammatical and stylistic errors, of which I will next list just a few examples.
- "perform the required reasoning expressivity" makes no sense grammatically: there is no such thing as "performing expressivity".
- "allow to extract" and similar phrases are ungrammatical and should be "allow us to extract" or something similar. Similarly, "enables to use" should be "enables the use of".
- What does it mean for "a high traffic street to have many interpretations"? (I.e., how does a street have interpretations?)
- "the all the" is a typo.
- "Lets" should be written "Let's", but in fact the abbreviations should not be used in formal writing so one should write "Let us".
- "a renovated and more general vision" is not proper English: one can renovate a house, but not a vision.
- "and can consumes" is a typo.
|