Semantic Referee: A Neural-Symbolic Framework for Enhancing Geospatial Semantic Segmentation

Tracking #: 2124-3337

Authors: 
Marjan Alirezaie
Martin Längkvist
Michael Sioutis
Amy Loutfi

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
Abstract: 
Understanding why machine learning algorithms may fail is usually the task of the human expert that uses domain knowledge and contextual information to discover systematic shortcomings in either the data or the algorithm. In this paper, we propose a semantic referee, which is able to extract qualitative features of the errors emerging from deep machine learning frameworks and suggest corrections. The semantic referee relies on ontological reasoning about spatial knowledge in order to characterize errors in terms of their spatial relations with in the environment. Using semantics, the reasoner interacts with the learning algorithm as a supervisor. In this paper, the proposed method of the interaction between a neural network classifier and a semantic referee shows how to improve the performance of semantic segmentation for satellite imagery data.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Dagmar Gromann submitted on 13/Feb/2019
Suggestion:
Minor Revision
Review Comment:

This is a joint review by all three guest editors of the editorial board of this special issue:

This paper presents an approach for informing a CNN Auto-Encoder (CAE) classifier about misclassification errors by using an RCC-8 reasoner which operates on top of a geospatial ontology. The three features considered are: shadow pixels, elevation, and general inconsistencies with the spatial relations in the ontology. Feedback from the reasoner is used to augment image segments with additional channels of information that are then fed to the classifier again. The submission is interesting and in our opinion within the scope of the special issue as it discusses in an original way how inference mechanisms associated with a (city/building/environment) ontology can interact with a neural classification architecture. It is well written, the idea is sound, and presentation-wise the motivation for using the reasoner, how it works, as well as presentation of results are clean.

As a minor detail, it remains to be seen how often the classifier misclassifies regions with high certainty (as in this approach, misclassifications are based on the assumption that they are low-confidence predictions). However, in fairness, rich empirical evidence suggests an improved classification performance with reasoning-augmented data considering the three mentioned features, even though it is also true that in some few instances it worsens the performance. It would have been valuable to read in-depth analysis of why (with working examples) the reasoner leads to drops in the classification accuracy in non-vegetation ground, for example. It would also be interesting to provide a deeper analysis of the differences in results across datasets. As you mention, vegetation seems to be more difficult in Stockholm than in UC Merced Land Use, for instance, but no explanation is offered here.

Questions:
- Most central question: is the whole approach rather an enrichment of the original input data, which leads to better performance, than an improvement of the classification algorithm itself? This is important for the argumentation of the paper.
- What is the "middle section" in the CAE architecture? Could you possibly provide a visual representation of the architecture to facilitate its understandability? Currently, the whole description of the architecture is restricted to one paragraph.
- The taxonomy of OntoCity in Figure 3 seems to be considerably more fine-granular than the classes used in the classification process. Could you provide an explicit mapping between both and the exact constraints on these (also in an Appendix if preferred)? Does the occurrence of RailRoad in three taxonomic relations (see Figure 3) not automatically lead to ontological inconsistencies? In Section 3.6. the description would be much clearer with one example of the many constraint violations between classification and reasoner.
- For reader not familiar at all with the topic of image recognition, it would be good to introduce the "RGB" acronym.
- In the sentence "The additional information provided by the reasoner is represented as image channels with the same size as the RGB input and is then concatenated together with the original RGB channels in the depth dimension." it is unclear to me what is meant by "depth dimension", although I understand the general idea behind the approach.
- Could you explain "patch of data" for people not familiar with this concept? Shouldn't the patch be 200 x 200 rather than 4000 x 4000 pixels?

Formatting (please ensure consistency with all of the journal's guidelines in the style guide):
- References such as "the works of [25] and [26]" without naming at least the first author seem very strange (throughout the paper)
- Could you use one consistent formatting for your axioms so that they are numbered and can and are referenced from the running text?
- Algorithm 1: please use the same notation for the same elements, e.g. R_s is written differently, as is P_s
- Align your notation in the text (across the whole paper): not list W, list w use one only (and what is called a "list" in text, is called a "hash map" in the Algorithm - please align)
- Figure. 7. => Figure 7
- Table 2: what does the "crossing out" of text signify here? Omitted classes?

Minor comments (in order of appearance):
In the abstract: "with in the environment". Shouldn't it be "with the environment"?
semantic web technologies => Semantic Web technologies (used correctly on next page)
[1], [2] => use \cite{firstrefernce,secondreference} which leads to [1, 2]
Despite ... => this linker does not work in this sentence (especially not with the continuation "seldom do these...")
recent success in machine => recent successes in machine
,e.g., => omit second comma (only for ,i.e.,)
A worth-mentioning cross-disciplinary application of RCC-8 reasoning => A cross-disciplinary application of RCC-8 reasoning worth mentioning
semantic web resources interact => Semantic Web resources interact
The (lengthy) sentence "We show using a specific case on large scale satellite data how semantic web resources interact with deep learning models to improve the classification performance on a city wide scale, as well as a publically available data set." should be rephrased, as it is rather confusing.
fist time => first time
the work by [29] have => has
are interesecting or not => intersect or not
its relative hight => its relative height
likely as its casting object => ???
subset of general knowledge that always hold => holds
As you can see => replace by “As can be seen” or a more canonical expression for academic writing
is why have they been added => is why they have been added
Buildings are not interescting => Buildings do not intersect
patch size … is is 4000 x 4000 => omit one “is"
shouldn’t the patch size be 200 x 200 instead of 4000 x 4000 pixels with selected areas of 4000 x 8000 pixels total?
carrying same class label => carrying the same class label
classification certainty probability => probability of classification certainty
argmax is following => follows
class weighing => weighting?
supsicions regions => suspicious (should be "uncertain” anyhow according to comments)
is more described in => is described in more details in
the training went through three rounds of training => through three iterations
non-vegetaion ground => non-vegetation
groundtruth => ground truth
Discussion & Future Work => and

Review #2
By Michael Cochez submitted on 25/Feb/2019
Suggestion:
Accept
Review Comment:

This review was done jointly with Md. Rezaul Karim

===================================================

The authors have sufficiently addressed our earlier concerns regarding this paper. We support the acceptance of the work.