A Stream Reasoning System for Maritime Monitoring

Tracking #: 1760-2972

Georgios M. Santipantakis
Akrivi Vlachou
Christos Doulkeridis
Alexander Artikis
Ioannis Kontopoulos
George Vouros

Responsible editor: 
Guest Editors Stream Reasoning 2017

Submission type: 
Application Report
We present a stream reasoning system for monitoring vessel activity in large geographical areas. The system ingests a compressed vessel position stream, and performs, in real-time, spatio-temporal link discovery to calculate proximity relations between vessels and topological relations between vessel and static areas. Capitalizing on the discovered relations, a complex activity recognition engine, based on the Event Calculus, performs continuous pattern matching to detect various types of dangerous, suspicious and potentially illegal vessel activity. We evaluate the performance of the system by means of real datasets including kinematic messages from vessels, and demonstrate the effects of the highly efficient spatio-temporal link discovery on performance.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jean Paul Calbimonte submitted on 17/Dec/2017
Major Revision
Review Comment:

This manuscript was submitted as 'Application Report' and should be reviewed along the following dimensions: (1) Quality, importance, and impact of the described application (convincing evidence must be provided). (2) Clarity and readability of the describing paper, which shall convey to the reader the key ideas regarding the application of Semantic Web technologies in the application.

This paper presents an application that aims to discover spatio-temporal relationships among ships (vessels), as well as geo-located areas, from a steam of incoming data.

While the system has an interesting use case with real data, there are several issues that raise important questions in different aspects. First, it is not too clear why using semantic technologies (e.g. RDF for representing data) is appropriate for this use case. Streams of RDF data are quite verbose, and while they might be interesting in some cases, they have shown important overhead for querying, processing, transmission, windowing, etc. For spatial-heavy processing, it is not clear why this approach would be advantageous compared to non-Semantic-Web approaches. Second, there seems to be a different understanding of what stream reasoning refers to, compared to the type of system that is presented. I elaborate more on this later. Third, The system described includes a set of experiments but there is a general lack of comparison with alternative solutions. Therefore, it is hard to assess if the numbers provided constitute a significant progress or not, or if they are comparable with non-SW systems. Fourth, the evaluation does not provide details in terms of how the system behaves with different types and numbers of relationships among ships and points of interest. This means, having an evaluation of how the system evolves, when changing the number and complexity of spatial relationships. Perhaps a synthetic dataset could be used in order to simulate these fluctuations in the data. Otherwise, the numbers provided do not provide more insight in this respect.

With these considerations in mind, the paper would require additional work to show clearer motivations to justify its technical choices. Furthermore, it would need to show in the evaluation more convincing results, including when possible the necessary comparisons with other laternatives, even beyond the scope of SW.

Some additional comments.
The authors claim to provide 'real time' spatio-temporal data analytics. However, the authors may want to revise this type of statements, given that 'real time' processing is a well-defined area in computer science, where time constraints and deadlines are very strictly defined. The paper does not seem to tackle real time computing in the proper sense of the term. Rather, it seem to provide 'online processing', with best-effort or no-strict guarantees.

The paper describes the system as a stream reasoning platform. IT would be important to clarify what type of reasoning is really supported by the system. The techniques described seem to be more related to spatio-temporal indexing than to stream reasoning. Otherwise I see no details concerning the expressiveness of the reasoning task, i.e. in terms of the ontology complexity. It might be the case that this is a terminology problem, and maybe that authors call stream reasoning to this type of spatio-temporal detection technique.

The application presented in the paper is rather a prototype and not a deployed application. If it is only a prototype and not a deployed application for the vessels, it might be questionable if the paper fits as an application report. Otherwise, the paper should provide more details about its deployment in this real use-case scenario.

Review #2
Anonymous submitted on 05/Feb/2018
Major Revision
Review Comment:

General Comments:

This application focuses on the problem recognising activities of vessels in a scalable way and in a streaming setting. The problem is reduced to enabling spatio-temporal link-discovery and making it doable in (near)-real-time for vessels in the marine scenario.

The paper is reasonably well organised and written, and it provides reasonable evidence of the impact of the suggested solution in the marine domain.

As an application report, however, the paper seems to described a prototype implementation as opposed to a really deployed application that can work in uncontrolled environments. If this is the case, then it would not be enough advanced to be published as an application report. The importance and impact are somehow clear, but the paper is also failing in providing sufficient details on the role and impact of Semantic Web Technologies in the application (see my detailed comments below).

I would have liked a stronger case in terms of how this is positioned in relation to the challenges in [14] and what new avenue would open. In fact, [14] encourages the use of semantics for data integration only, and ML and Data Mining for pattern discovery and stream processing, while you rely on a more knowledge-informed approach captured with RTEC and do not provide sufficient details on how semantic technologies have been used for data integration (beyond vague reference to the ontology) which I think is the major fault of this paper.

For example, continuous query processing is mentioned for the activity recognition, but little details are provided on what tools are used for evaluation of CQs? Is it a language and semantics internal to RTEC? or is it something like CQL? or is it something like RDF Stream Processing? This could be important if it relates to the use of semantic technologies for query processing and data integration.

Specific comments to be addressed:

It would be good to support and motivate the choice and briefly indicate the advantages of this solution and the challenged addressed, and also where does semantic web play a role.

Also, considering the substantial amount of knowledge to be created (e.g. the rules) for detecting activities, some consideration should be presented about the maintenance of such a system where there is no learning of rules or automatic knowledge gathering.

The work builds upon earlier work [45] which focused on detecting changes in trajectories, and it adds a component that focuses instead on efficient link discovery.
Despite this is intuitively clear in terms of the added contribution, I would encourage a clearer motivation of where would this component add to the [45] and why is this new perspective/representation necessary. Is it just a way to optimise the computation or does it add to the type of knowledge available that could be used in different ways?

It seems to me that the links discovered are naturally dynamic, therefore they can be characterised as events, as you do in [45]. Might be just a matter of terminology, but in order to understand the enhancement to [45], Is (suspicious) activity recognition the same as Complex Event Recognition?
It seems that topological relation (links?) between vessels and static areas were also discovered in [45] (there they are called complex maritime events). You need to be clearer here on the extra work done around link discovery versus CE detection through trajectory analysis (done in [45]).
Page 10 indicates that some aspects could not be computed (e.g. intervals) due to the absence of nearbyArea detected as an event. Although I have no serious concerns about the contribution of this paper, a comprehensive list of all of these aspects that are additional to what published in [45] would be indicative of the additional material and should be provided early on. This only appears in the first paragraph of Section 6.

Some of the generated events are related to speed, and I suppose they are somehow used when summarising trajectories or annotating potentially dodgy trajectories. I wonder is any of this speed information used for link detection, e.g. to prune the search space or to inform when should a cleaning of the grid be done?
Would this be an informative piece of data to use?


In your experiment you focus on vessels around Brittany in what seems to be a proof-of-concept prototype. The spatial datasets are covering a much broader area and you do not seem to have deployed an application that can seamlessly work across those big datasets. It would be interesting to see how your solution scales by having multiple fleets and managing a larger scale of vessels with density distributions in different areas. As a curiosity, have you in plan to cover this as part of the project or beyond, and see how this compared to having multiple parallel executions? Is this part of a future deployment while the paper describes the initial prototype?

Review #3
By Danh Le Phuoc submitted on 23/Mar/2018
Major Revision
Review Comment:

The manuscript presents an application report on building a stream reasoning system for monitoring several activities in large geographical areas. The work is extended from authors’s previous paper[45] which uses the core technology of theirs in[6]. Authors report their experiences in overcoming the performance issues when dealing with a complicated processing pipeline involving various data operations that are not trivial to build efficient indexes or materialised views.

The content has a good structure with an easy-to-follow storyline that helps the reader to understand the technical changes and why authors have to go through all the hurdles to build their system. However, judging the manuscript as an application report, the report must indicate a deployed application other than just a simulated lab test, therefore, author must provide a detailed real setup for such a application otherwise the paper should go to the "full paper" track with a longer content. Following is the other comments to revise the next version.

1. Definition 1: “if p in enclosed in A”, the meaning of “enclosed” here needs to be more precisely defined.

2. Definition 3: the nearby relation can be defined based on vessels V1 and V2, no need to put p an p’ in the notation nearby(V1.p, V2.p',…) if we consider p is a property of a vessel V, then we can have nearby(V1,V2,…) which is consistent with definition 1 and 2. Then consider consistently use , V.p, then p.x,p.y,p.t in this section.

3. The process of cleaning expired points from third paragraph of section 3.3 needs to be discussed on the locking implication which caused by the concurrency of the multi-core processing context evaluated in the paper. Moreover, this approach needs the assumption that the incoming data has to be in strictly order, so, this assumption needs to be explicitly stated.

4. Section 1, first paragraph mentions ‘windowing” term here, but, there is no further discussion at this point until then introducing window parameter in the evaluation at Section 5.3. I think some definition or description on “window" should be introduced here.

5. Section 5.1, the paper does not compare its to a baseline but itself, so, gives some explanations to why?

6. Section 5.2.2, it’s not clear for me what the temporal threshold of 30s is used for, how is related to this experiment setting.

7. Section 5.3, the experiments use the slide step at 1 hour and window 8 hours (with 31k input events) while the processing time is less than 3.5 second, then, the reported throughputs are ranged from 5-25k events per second. These figures paint a quite inconsistent picture of workload in to me. I think, authors need 1-2 paragraphs to explain the correlation among input throughputs, number of items in windows when an execution step is trigger, and then, how the slide step parameter with play the role in the over processing workload here.

8. I would suggest to use the term “near real-time” or “online” instead of “realtime” as authors used here and there,

9. The paper has several language issues, please invest considerable efforts in the next revision, here are some of them I came across:
-Third paragraph of section 1: typos: “illegal vessel behavior"->“illegal vessel behaviors"
- First paragraph of section 2: "e.g., a vessel is located within and area” -> “an” instead of “and”? ; "various types of suspicious, dangerous or illegeal vessel activity.”-> …”illegal” and “activities” ?
- Second paragraph of section 3.3: "If no cleaning were performed, then too many (old) vessel positions would be retrieved that satisfy the spatial constraint, but would be eliminated due to the temporal constraint, leading to wasteful processing”-> the sentence structure is messy, please rephrase.
- Firsts paragraph of section 4: |to detect various types of suspicious, dangerous and illegal vessel activity”-> activities?
-Section 4.2, "Rule (3) is but one of the possible…”->?
-Authors use “some” with singular nouns in several places, please double check.