Stream Reasoning and Complex Event Processing in ETALIS
This paper has now been accepted for publication in the third round of reviews. Both the second round resubmission and the original submission received a "reject and resubmit" decision. The reviews below are in reverse chronological order.
Reviews for the third round:
Solicited review by anonymous reviewer:
The revisions went some way to addressing my concerns on the methodology and experimental results. The substance of my criticisms still stand in many cases: there are no statistical tests so there's no evidence that the results did not occur by chance, no objective support for the conclusions. But, if I squint, this now does look more like what I'm guessing they are supposed to be: indicative results to illustrate the usefulness of tool, but not scientific results to show objectively the characteristics of the approach. Assuming this is still a "tools and systems" paper, I guess I am satisfied.
Reviews for the second round:
Review 1 by Carsten Keßler
This resubmission addresses most of my comments on the initial version and has improved significantly. The writing style has also improved, the paper is very readable and easy to understand now. If the following minor problems are addressed, the paper is ready for publication:
- throughout the paper, use either "the ETALIS tool" or just "ETALIS"; also, use "background knowledge", not "a background knowledge"
- on p.2, the sentence "For example, how likely…" is cumbersome and hard to understand; please rephrase.
- Figure 3 is cut off
- The last paragraph on p.6 appears twice.
- Remove table one. Most namespace are common for SWJ readers, and the tr namespace is introduced in the code listing below (also remove the reference to the table there).
- In section 3, you mention that the RDF code is accompanied with timestamps. How is that implemented? Providing such provenance data is a known problem with RDF, and approaches such as named graphs try to solve them. Also, you mention in the footnote that timestamps can be omitted. How do you do the temporal reasoning on the events then?
- The subclass relationships listed on p. 7 are conceptually wrong. "Accident" is not a subClass of "slowTraffic" (which would mean that every accident is also an instance of slowTraffic). An accident can be the CAUSE for slow traffic, which is a different thing, so please do not use rdfs:subClassOf here.
- section 3.2: "A user writes EP-SPARQL queries and deploy them into the engine." -> deploys
- "Test 2: an example applications." -> application
Review 2 by Florian Probst
The introduction has improved. The sensor example helps a lot in understanding the potential application areas and the inner workings of ETALIS. That is a good improvement.
Some comments:
What is ontological knowledge? Knowledge that is formalized via a formal structure called ontology is not ontological knowledge…
In the introduction you stress the need of ontologies to capture the knowledge of the domain. Then you introduce semantic(s)-based complex event processing (and call it in the next heading just "Semantic Complex Event Processing). Why not call it Ontology-based complex event processing.
Your approach has not much to do with the meaning of symbols but with a formalized domain knowledge.
--> ontology ("how things in the world are related") is in the focus not semantics!
"To match two events in a pattern, we often need to prove semantic relations between them;"
--> What is a semantic relation in contrast to an ontological relation? I guess you what to stress the ontological relation between the events (real world things). Two real word things do not have semantic relations, this should be reserved for the symbols or words we use to talk about them…
The naming of the patterns is not optimal. Take P1 starts P3 andP1 equals P3. The second pattern can be read as a "normal" sentence while this is not correct for the first pattern. This would lead to the interpretation that P1 is starting the complex event P3. Why do you introduce such pit falls for misinterpretations? Why not P1 starts with P3?
The distinction between the "real world event" and the "system event" is still not clear. In 2.1. you write "an event represents something that occurs". This indicates that an event is some information object in the system. Later you argue that a deductive system is not optimal for detecting a complex event. Here the event seems to be a real world occurrence itself (slow traffic) and not its representation (a symbol representing an area where slow traffic happens)! This is not precise. Please clarify.
Grammar & Style
Please revise:
Abstract: "… combine them with background knowledge…" --> skip "a"
"For example, how likely is that complex events of type: event a is followed by event b in the last 10 seconds, can be used to trigger critical business decisions? "
" What is a "slow" traffic, and what is a "one" area (for different events, roads, and road subsections) is specified as a background (domain) knowledge (for that particular application). "
Please revise: … a "one" area ….
"Moreover, the figure informally presents -- the -- semantics of the language (a formal semantics can be found in [4])."
Parts of Figure 3 are not visible.
Section 5:
Either: "The ETALIS tool …" or "ETALIS is implemented… "
"the goal of the test was to show – the – usefulness of our …"
Review 3 by anonymous reviewer
I thought this paper before was reasonable as a "tools and system" submission (i.e., limited theoretical content, and limited evaluation). And I'm still broadly supportive of the manuscript, assuming that this is still a "tools and system" submission (but this time I can't find this information listed anywhere on the manuscript or review forms, so I am a little unsure if this has changed).
The revisions do attempt to address one of the main comments in the previous revisions---the evaluation. However, I think these changes introduce some significant technical errors that need to be fixed.
1. Test 1 presents a throughput comparison. But this is not complete or convincing. 3 of the tests have ETALIS as the winner, 1 has Esper. But could this result occur by chance, is the result statistically significant? The authors need to run not just one test for each operator, but a population of tests, generate some summary statistics for those populations, and ideally a hypothesis tests to provide some quantification of how (im)probable it is that these two systems are actually performing at the *same* level. And then we need some discussion of these results, some insight as to what they show---apparently the AND operator is less efficient than Esper, why is this?
2. Test 2 falsely claims that the time and space complexity scales linearly with input size, based on only two input sizes (1 and 10 people). This is erroneous; one requires a *minimum* of three, and ideally at least 5 or 6 different input sizes to support assertions about the scalability (e.g., in the current data, it is entirely possible that is scales O(n^2) or O(log n), or indeed one can fit *any* regression curve to just two data points). If you are going to investigate scalability, you need a proper regression analysis, e.g., generate R^2 values to indicate the goodness of fit of the regression relationship.
In short, I think this paper is a long way short of a full research article, even though I am inclined to think it is a reasonable tools and systems paper. I am a bit concerned that the revisions are taking this paper into the realm of being a seriously flawed research article, as opposed to a what seems to me a reasonable tools and systems paper.
The reviews below are for the original submitted version.
Review 1 by Carsten Keßler
This paper introduces ETALIS, a stream reasoning and complex event processing (CEP) system that detects events in data streams. The distinctive feature of ETALIS is the use of ontologies, so that temporal reasoning approaches commonly used in CEP can be combined with semantic reasoning. As such, the topic covered in this paper is relevant for SWJ.
Unfortunately, the paper suffers from numerous grammatical errors and typos - too many to mention them all here. The authors should ask a native English speaker to proof-read it and check for odd phrasings. Moreover, the authors should check the references. Some of the seem to be incomplete, others lack proper capitalization. Unfortunately, the paper is not ready to be published in its current form.
That said, the contents of the paper is interesting and innovative. While I am not an expert in CEP, the ETALIS system is well motivated and its benefits are well documented. Both the ELE language and the workflow for processing ELE are easily comprehensible. However, there is still room for improvement:
- The introduction could use some examples. It only motivates the system on a very high level and the reader is left a bit puzzled about what the authors mean by "background knowledge". It is only in 2.3 where the reader is finally provided with a concrete example. It would help a lot if this was already given in the introduction. Moreover, "on-line advertisement" might not be the best motivation for a research paper, especially as very appealing use-cases are given later in the paper.
- Figure 1 is also quite high-level and only gives a very abstract idea of the system. There are events coming in, which seem to be combined with event patterns and domain knowledge (from a maze?), and then some magic happens that transforms that into complex events. I acknowledge that this is only a conceptual view on the system, yet some more detail on how this works (i.e., what happens inside the gear-wheel) would be useful.
- The last paragraph of 2.1 should also be more detailed. In which cases does the system consult background knowledge, and how?
- In section 5, it is very useful that you provide a comparison with other systems, reconstructing experiments from the literature. It would help if you could also add their results in Figure 6 for comparison. Moreover, you should discuss *why* your system is so much faster, as this is of major interest to most readers.
- It would be helpful if the authors could provide some context for this paper in terms of relevant related work on CEP and stream reasoning, especially for non-experts such as myself.
Again, the main problem with this paper is the poor presentation. I therefore strongly encourage the authors to work on the presentation as indicated above and resubmit the paper.
Review 2 by Florian Probst
The paper presents an approach for combining stream reasoning and complex event processing. This combination appears highly promising. The authors follow the assumption that the number semantically annotated information sources will increase significantly in the future. This will call for methods for reasoning about this streaming data directly as it is produced. Such methods would be further improved if complex event processing could be used to draw further conclusions from the results of the stream reasoning methods.
The paper presents a tool called ETALIS system. This system is supposed to detect complex events. Here it remains somewhat unclear what a complex event is. Either an event that happens in the "real world" or a representation of such an event in an IT system. Later in the article an example related to stock prices is given. It remains unclear to which extent the currently available information on stock prices is semantically annotated in a sufficient way.
The explanation of the approach seems conceptually plausible. However, it remains difficult to judge its practical applicability in a real world setting. Especially the definition of complex event patterns seems to be a cumbersome task that would need to be done for lots of situations. Here a more elaborate example would be helpful.
In the introduction, the usefulness of reasoning is highlighted as in many other papers, yet an example of what kind of non-trivial reasoning results can be obtained is missing. The promise of combining complex event processing with background knowledge provided by an ontology is explained with the help of the wine ontology (which correlates not well to the stock exchange example). The wine ontology is not a particularly good example for a rigorous ontology. Here the paper would benefit from a more elaborate example. Hence it remains unclear what the real benefit is of having an ontology in the background. The patterns seem entirely hard wired. An example where the ontology helps in finding conclusions that were not known before hand seems difficult.
Just a thought, semantically annotated sensor observation services, providing streaming observation information in combination with event patterns that indicate dangerous climatic conditions could provide further real world examples where the streaming sources are exchangeable (failing services). E.g. ozone concentration, temperature, wind direction and traffic concentration in a densely populated area.
The paper has some minor grammatical issues.
Review 3 by Anonymous Reviewer
This paper is nicely written---very clear, detailed, and easily understandable. To the best of my knowledge it seems to describe a useful and practical tool. My main criticisms of this paper are that 1. it seems to provide very limited theoretical content (doesn't really advance the concepts much); and 2. the evaluation is intrinsic to the system (i.e., efficiency) but does not enable comparison with other approaches/alternatives. However, given that this is a tools and systems submission, I think 1. is presumably to be expected. And in fact, while the paper would surely be improved by such a comparison, I think in the context and system paper, such a comparison probably is beyond the scope of the paper. So, it seems to me that this paper does fulfill the review criteria for a tools and systems paper.

