Video Representation and Suspicious Event Detection using Semantic Technologies

Tracking #: 2427-3641

Authors: 
Ashish Singh Patel
Giovanni Merlino
Dario Bruneo
Antonio Puliafito
Muneendra Ojha
Om Prakash Vyas

Responsible editor: 
Armin Haller

Submission type: 
Full Paper
Abstract: 
Due to the widespread deployment of Surveillance Systems and IoT applications, the amount of surveillance data is massively on the rise. Storing and analyzing video surveillance data is a significant challenge, requiring video interpretation and event detection along with related context. Low-level features from multimedia content are extracted and represented in symbolic form. These features include shape, texture, and color information of the multimedia content. In this work, a methodology is proposed, which extracts the salient features and properties using machine learning techniques typical of the surveillance domain, and represents the information using a domain ontology tailored explicitly for the detection of certain activities. An ontology is developed to include concepts and properties which may be applicable in the domain of surveillance and its applications. Extracted features are represented as Linked Data using an ontology. The proposed approach is validated with actual implementation and is thus evaluated by recognizing suspicious activity in an open parking space. The suspicious activity detection is formalized through inference rules and SPARQL queries. Eventually, Semantic Web Technology has proven to be a remarkable toolchain to interpret videos, thus opening novel possibilities for video scene representation, and detection of complex events, without any human involvement. The proposed novel approach can thus have representation of frame-level information of a video in structured representation and perform event detection while reducing storage and enhancing semantically-aided retrieval of video data. A video dataset of six different, and unusual, suspicious activities has also been built, which can be useful to address problems related to activity recognition in other smart parking scenarios and thus opens up plethora of use-cases as well.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 25/May/2020
Suggestion:
Accept
Review Comment:

The paper has been improved based on the previously provided review, and is significantly extended in this revision.
The following minor changes would be needed:

1) The namespace of the ontology is a must. Under section 3.4, the link of the ontology should be added in a footnote. Ideally, the link used in the attached ontology file would be an actual link, rather than a symbolic one (which is the case at the moment), hosted wherever (e.g., at the server of the author's university, or on GitHub) and pointed to with a permalink set on a site like purl.org.

2) In section 3, reword the sentence “We develop an ontology exploiting the expressiveness of description logic.” to something along the lines “The description logic expressivity of this ontology is ALHI+(D).”

3) Reword and correct the typo in the sentence “Thus, owl:sameAs is not used instead sameObejct is used.” Change it to “Therefore, sameObject is defined in our ontology to be used instead of owl:sameAs.”

Review #2
Anonymous submitted on 21/Jun/2020
Suggestion:
Accept
Review Comment:

Authors has done a set of good jobs to address my concerns in previous round. Now the submission is much better shape.

Review #3
Anonymous submitted on 30/Jun/2020
Suggestion:
Minor Revision
Review Comment:

Following the reviewer comments from the first submission, the authors appear to have made some significant changes to the manuscript in order to improve the quality of the article. Although the authors have significantly increased the scale of their literature review and applied their approach to additional datasets (with small comparative results), the paper still suffers from grammatical and style issues making the paper hard to follow and a lack of critical discussion about the results. In particular, the paper sections are disjoint and the paper tends to favor long convoluted descriptions rather than straight to the point explanations. Some of these explanations are much clearer in the authors' cover letter response rather than in the paper. For example, the response to Comment 5 makes a better job of explaining the paper contribution compared to what is written in the paper.

Although the content of the paper has improved, the quality of the writing is still a major issue and the paper needs major restructuring and better focus. In my opinion, this is the main remaining obstacle for the article to be accepted.

Additional comments:
- The last sentence added in the abstract does not integrate with the rest of the abstract and seems out of place. The abstract should better highlight the key finding and results of the work.
- There are multiple grammatical issues in the introduction as well as repetitions (e.g., lines 21 and 25) and some sentences are very vague or lack information (e.g., line 38/39). The introduction should be also more concise.
+/- Although the authors have added a large amount of non-semantic literature, the literature review is mostly descriptive and fails to highlight the difference and weakness of the approaches compared to the authors' work as well as how they could integrate with the proposed work.
+/- Is the new dataset obtained from real CCTV footage or created artificially (the videos look staged with the same people appearing in different clips)? This information should be added to the manuscript.
+/- Although it is great that the authors have extended their study to new datasets with reported improvement about recent work, the added section is largely disconnected from the rest of the paper.
- Table 8 is unclear and does not show if the detection was correct or incorrect compared to the annotations. Consider making the table clearer and reducing the description of each scenario.
+/- Are there any differences/additions to the ontology and queries to the original SPARQL queries and rules?