Ontology for a Panoptes building: exploiting contextual information and smart camera network

Tracking #: 1620-2832

Roberto Marroquin
Julien Dubois
Christophe Nicolle

Responsible editor: 
Guest Editors ST Built Environment 2017

Submission type: 
Full Paper
The contextual information in the built environment is highly heterogeneous, it goes from static information (e.g., information about the building structure) to dynamic information (e.g., user’s space-time information, sensors detections and events that occurred). This paper proposes to semantically fuse the building contextual information with data coming from a smart camera network by using ontologies and semantic web technologies. The ontology developed allows interoperability between the different contextual data and enables, without human interaction, real-time event detections to be performed and system reconfigurations. The use of semantic knowledge in multi-camera monitoring systems guarantees the protection of the user’s privacy by not sending nor saving any image, just extracting the knowledge from them. This paper presents a new approach to develop a "all-seeing" smart building, where the global system is the first step to attempt to provide Artificial Intelligence (AI) to a building. More details of the system and future works can be found at the following website: http://wisenet.checksem.fr/ .
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Florian Vandecasteele submitted on 16/May/2017
Major Revision
Review Comment:

The authors proposed the WiseNet ontology for combining, analyzing and re-purposing the information from a smart camera network. The described process is interesting and combines different existing ontologies, such as, IFC, event, time. Furthermore, a demonstrator is built to show the procedure of the framework. The paper is work in progress as stated by the authors, but some preliminary evaluation with existing frameworks should be included in this paper before publishing.

The paper is lacking a clear structure and different sections are mixed and hard to follow.
I would recommend breaking the long paragraphs into smaller ones to ease readability, including itemizations and enumerations whenever possible. Furthermore a strong reduction, or clarification on specific ontology terms in Section 3 is necessary to increase the readability of the text. Finally, it should be made more clear in Section 3 and 4 that this is the new proposed framework.

There are suggestions to integrate some WiseNet elements in the IFC standard, but no comparison is done with the recent IFC4 ontology. It should be made more clear what are the elements that could be integrated. Furthermore, a smaller ontology is proven to be more successful this should also been taken into account.

The authors use computer vision mechanisms to exploit the camera network. It is stated that this is outside the scope of this paper, but they should be discussed (shortly) in this paper. Furthermore, there should be more explanation on how the WiseNet architecture would overcome the computer vision limitations (false detections) before publishing this paper. Currently, there are only detections included and there is from my point of view no possibility to solve these false detections or occlusions.
In the paper there is a short description on the visual descriptors, for a person, focusing only on RGB values. This is not useful for people tracking or re-identification. I would suggest the authors to extend descriptors with other person semantic classes (i.e., age, gender, facial expression).

Finally, in the future work section I would append some extra applications( i.e., smoke or fire detection, door or window status (open-closed)) to make a more smarter network.

Review #2
Anonymous submitted on 07/Jun/2017
Major Revision
Review Comment:

The paper addresses a problem that is relevant both from scientific and industrial perspective.
The proposed approach is well fitting in the scope of the SWJ journal, even though more experiments would be beneficial to show how the WiseNET system can be actually applied and give added value compared to alternative and already existing solutions. If some modules of the architecture are still under development, then the authors could at least define some scenarios and KPIs to be evaluated in the future.
Moreover, the authors should further highlight which are the actual novel contributions beyond the state of the art with respect of the various modules of the proposed approach.
Some concepts and goals are repeated along the article. It would make sense to formalize them at the beginning of the paper and then reference them.
An overall graphical representation of the architecture to show how the ontology modules are integrated is missing (cf. table 2).
Are the classes and properties in Tables 3 and 4 novel definitions of the WiseNET ontology? It is suggested to include the prefix to avoid misunderstanding. Moreover, it should be better explained why the WiseNET ontology redefines part of the content of ifcowl (e.g. properties “aggregates”, “spaceContains”). Why isn’t it enough to extract a fragment of ifcowl and use it in integration with other ontology modules? This would allow to skip a few query+update steps in the proposed methodology.
The query update examples for sect.6.2 are missing. This part is probably even more interesting and relevant than what is shown in listings 4 and 5
Further comments:
- It is not appropriate to include a link in the abstract
- A reference could be placed when WiseNET is mentioned the first time in Sect.1
- In page 3, [10] is not the most proper reference about IFC EXPRESS to OWL. In addition to [30], also the paper Pauwels et al. 2017 (“Enhancing the ifcOWL ontology with an alternative representation for geometric data”) can be considered, since it includes the final version of ifcowl approved by bSI.
- In page 3, the sentence “Currently, ontologies are represented using OWL-2 language…” is not correct since OWL-2 is not the only ontology language available.
- Page 5, [28] is not the appropriate reference to ifcowl. Consider [30] and Pauwels et al. 2017.
- Section 3.2 can be made shorted since its contribution to the paper is quite limited.
- Fig.4 is included before Fig.3
- Sect.5. Actually, ifcowl is just the ontology T-box and it does not include proper instances. It is suggested to use IFC-RDF graph (or something else) when referring to the instances. This means also that querying the ifcowl instances are not returned (cf. sect.6.1.)
- Sect.5. The use of the term “compliance check” while just looking at which classes are instantiated is a bit misleading.
- Sect.5.2. The prefix “inst” is a pure convention related to the IFC-to-RDF conversion and is not defined in the ifcowl ontology.
- Listing 3. For completeness, also the inverse properties of ifcowl:relatingObject_IfcRelDecomposes ifcowl:relatedObjects_IfcRelDecomposes should be considered in the query.
- Listing 3. The query may return bindings with ?elementType=owl:NamedIndividual, therefore it would be better to add a FILTER like in Question2 of Listing 6.
- Fig.5. It’s a bit strange that a property of the a door (key system) is entered in the Camera Setup GUI. It is indeed an extension of what is converted from the IFC file.
- Listing 5. The query can be made more compact by exploiting “,”, avoiding to unnecessarily repeat rdf:type.
- Listing 5. Why is DUL:hasLocation used instead of what can be already found in ifcowl?
- Listing 6. Question 2. A semi-colon should go after ?x instead of a dot.
- Listing 6. Question 3. No need to specify a time stamp?

The paper is well structured and the use of English language is generally good, but it must be improved. There are errors and typos throughout the paper, including grammatical and wrong lexical choices. Here are some examples:
- Page 4, column 1, line 10, “unify” in place of “unified”
- Page 4, column 1, line 14, “where” in place of “were”
- Several times the word “fusion” is used as a verb, but it is a noun (cf. page 4, 17)
- Page 4, column 1, “warrants” in place of “warranties”
- Page 5, col 1, “interoperability between” in place of “interoperability among”
- Page 8, col 2, “focus on” in place of “focus in”
- Page 9, col 1, “inserting it into the” in place of “insert it to”
- Page 9, col 2, “people are” in place of “people is”
- Page 9, col 2, “may have occurred” in place of “may occurred”
- “consist on” is wrong, either use “consist of” or “consist in”, depending on the meaning (cf. pages 10, 13, 15)
- Page 10, col 2, “contained in” in place of “contained on”
- Page 15, 17, “especially” in place of “specially”
- Page 15, “it satisfies” in place of “it satisfy”
- Page 15, to clarify “the devices utilize”, maybe it should be “devices utilized”
- Page 15, “does not need” in place of “does not needs”
- Several errors in the final paragraphs of sect.7.

Review #3
Anonymous submitted on 05/Jul/2017
Major Revision
Review Comment:

In this paper, the authors present an ontology to serve as the backbone of an indoor CCTV system. The ontology integrates a number of well-established vocabularies and ontology models including the OWL representation of the widely-used Industry Foundation Classes model ifcOWL, in order to extend existing Building Information Models with dynamic surveillance data as a precursor to an 'intelligent' building.
The overall quality of the writing and the organization of the paper is good, but somewhat verbose in certain sections (why is so much focus put onto the decidability if most of the inferences and queries are done in SPARQL?)
The authors introduce a system called WiseNET, that "may overcome some limitations of computer vision (e.g., false detections and missed detections), some drawbacks of deep learning (e.g., the need of large amount of training and testing data) and limitations of multi-camera based system (presented in Section 1) while allowing real-time event/anomalies detection and system reconfiguration."
The system is demonstrated by a minimal test case scenario of a single storey with two different rooms (hallway and 'space 303') that is accompanied by illustrational videos provided online, but is not further validated in reproducible settings and with data sets to back up the claims.
While the instance data handled by the system focuses on high-level information that is combined with logic rules, it remains unclear how this information is propagated back to the SmartCameras doing the feature extraction of e.g. peoples autonomously and independently in order to "overcome some limitations of computer vision (e.g., false detections and missed detections)" . The back-propagation of context information to improve the detection accuracy is neither explained nor proven by data.
While the extraction of spatial information from an underlying building model to set up topological relations seems sound, the use seems to be limited to an initial setup including the inference of room connectivity graphs (shared doors) and simple 'nearby' relations (Listings 1 and 2). While these basic relations might provide some degree of spatial context, their usefulness for real-world situations remain unclear.
The usefulness introduction of the suggested user interface mockup in fig. 5 is questionable without further usability testing and evaluation.
The scalability of the dynamic population (section 6.2) is unclear (how are SWRL(?) inferences performing on real world buildings with hundreds of rooms and cameras etc.). The usability of providing the bounding box descriptions "xywh" without projecting them into 3D-space and the intention of the RGB values in "visualDescription" is unclear. The temporal events of persons being detected at certain moments in time is not further used in any of the demonstration queries and their relation to the spatial configuration of the building is not shown.
Listing 6 is referred to but misses a label on page 15, as do other listings on e.g. page 11
The listings in tables 3-5 seem overly extensive and could go into a (digital) appendix.
Many of the 'competency questions' in Table 1 are limited to static building data that could be answered from the original ifcOWL model data even if the semantic shortcuts bring some improvements regarding the reduction of complexity of queries (which are hidden from end users anyways)
I would like to strongly encourage the authors to continue this interesting line of work by introducing generic, reusable concepts pertaining to notions of 'virtual spaces' e.g. by camera FOV superimposed on the physical spaces of e.g. rooms.
In a revision, statistical data of the validation and use case should be presented and the data sets should be made available to the reader.
An interesting addition to the current state of the art would be the extension of an ontology with modeling the precise fields of views of all cameras to allow a detailed calculations of camera coverage through e.g. overlapping FOV volumes. This would require the development of a spatial calculus that could be represented in the ontology in either static (immovable cameras) or dynamic spatio-temporal configurations (when is a feature detected in e.g. rotating cameras). For this, the notion of 'space' must be extended beyond the mere physical spaces from a static building models.