A Survey on Visual Transfer Learning using Knowledge Graphs

Tracking #: 2878-4092

Sebastian Monka
Lavdim Halilaj
Achim Rettinger

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Survey Article
The information perceived via visual observations of real-world phenomena is unstructured and complex. Computer vision (CV) is the field of research that attempts to make use of that information. Recent approaches of CV utilize deep learning (DL) methods as they perform quite well if training and testing domains follow the same underlying data distribution. However, it has been shown that minor variations in the images that occur when these methods are used in the real world can lead to unpredictable and catastrophic errors. Transfer learning is the area of machine learning that tries to prevent these errors. Especially, approaches that augment image data using auxiliary knowledge encoded in language embeddings or knowledge graphs (KGs) have achieved promising results in recent years. This survey focuses on visual transfer learning approaches using KGs, as we believe that KGs are well suited to store and represent any kind of auxiliary knowledge. KGs can represent auxiliary knowledge either in an underlying graph-structured schema or in a vector-based knowledge graph embedding (KGE). Intending to enable the reader to solve visual transfer learning problems with the help of specific KG-DL configurations we start with a description of relevant modeling structures of a KG of various expressions, such as directed labeled graphs, hyper-relational graphs, and hypergraphs. We explain the notion of feature extractor, while specifically referring to visual and semantic features. We provide a broad overview of KGE-Methods and describe several joint training objectives suitable to combine them with high dimensional visual embeddings. The main section introduces four different categories on how a KG can be combined with a DL pipeline: 1) Knowledge Graph as a Reviewer; 2) Knowledge Graph as a Trainee; 3) Knowledge Graph as a Trainer; and 4) Knowledge Graph as a Peer. To help researchers find meaningful evaluation benchmarks, we provide an overview of generic KGs and a set of image processing datasets and benchmarks that include various types of auxiliary knowledge. Last, we summarize related surveys and give an outlook about challenges and open issues for future research.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 15/Sep/2021
Minor Revision
Review Comment:

I noticed that my previous comments have not been addressed.
Moreover, the list of contributions is the following:
1) A categorization of visual transfer learning approaches using KGs according to four distinct ways a KG can be combined with a DL pipeline.
2) A description of generic KGs and relevant datasets and benchmarks for visual transfer learning using KGs for CV tasks.
3) A comprehensive summary of the existing surveys on visual transfer learning using auxiliary knowledge.
4) An analysis of research gaps in the area of visual transfer learning using KGs which can be used as a basis for future research

I could not really pick those contributions when I read the paper. It is not clear to me at all the categorisation proposed in the manuscript. The intended one is the introduction of four categories describing ways to combine KG with DL. For the other three points I think sections 5, 6 and 7 should be the ones that cover those. The authors should clearly indicate where each contribution has been discussed.
I suggest to the authors to clearly address each previous and current comments o
f reviewers before submitting again the revised paper.

Review #2
Anonymous submitted on 27/Sep/2021
Review Comment:

The authors implemented the changes I suggested and I believe that the paper is now much more robust. I am happy for it to be accepted.

Review #3
Anonymous submitted on 07/Oct/2021
Major Revision
Review Comment:

The new version of the manuscript demonstrates improvement from the prior version. The overarching RQs are now tied to the different sections and well contextualised. However, the manuscript still needs a fair amount of work.

The mapping of the Knowledge as a Trainee/Trainer/Peer with those by Aditya et al. introduced in the new version definitely helps to situate the reader and to clarify the contribution.
However, the mapping of the KG as a reviewer category needs further clarification. In particular, the authors claim that their KG as a reviewer category is transversal to two categories in Adityia et al.: (i) knowledge integrated in post-processing, and (ii) knowledge integrated in the intermediate layers of the DNN, because “we see knowledge layers in the DNN as an intermediate reviewing and validation process.” However, it seems to me that this claim conflicts with the definition presented earlier, that, in the KG as a reviewer configuration, the DNN is an independent component, which is applied ahead of the KG. In other words, the intermediate layers of the DNN do not tap into any external knowledge coming from the KG. Thus, the definition of DNN layers/ embeddings as intermediate validation steps seems to go against the separation between DNN and KG in the KG as a Reviewer setup. Thus, at present, the “KG as a Reviewer” category still appears as isomorphic to category (i) above in Aditya et al.

The structure and organisation of the paper also needs further work:

- In Section 3.1, certain definitions are accompanied by their related citation, whereas others are not. I would suggest that all terms in the background section are opportunely referenced.

- In Section 3, the organisation of paragraphs into feature extractor, visual feature extractor and semantic feature extractor works well. However, I am still a bit confused about how the different types of KGE are presented (Section 3.3). A new categorisation is introduced compared to the prior version: i.e., unsupervised vs. supervised KGE. These two categories are then accompanied with a third paragraph, where the authors discriminate between KGEs based on hyper-relational graphs and hypergraphs respectively.
I suggest to either keep the same categories of the prior version of the paper (provided that Knowledge Graph Embedding is still kept as a main heading, with Entity Embedding, Directed GE, etc. being subsections) or that the choice to introduce the distinction between supervised and unsupervised is opportunely motivated. What function does this categorisation serve in the paper? How is it linked to the other terms/categories presented earlier in the paper?

- Section 3.4 is titled “Training objectives for joint embeddings”. The term joint embedding is introduced here for the first time. Please provide definition earlier in the paper that clarifies the link between "joint training objective" and "joint embedding", so that the paper is more accessible to the non-expert.

- Section 4. As mentioned in my previous review, the general definition of visual transfer learning should be presented earlier in the paper, before diving into the details of KGE for transfer learning. I suggest it is incorporated back in the Background section before the definition of Knowledge Graphs. This structure reflects the main narrative: “visual transfer learning using Knowledge Graphs”.

Minor note: Lines 49-51 -> punctuation seem to be missing “Posed by graph irregularities (GAT [30]) None Euclidian graph convolutional methods yield significant improvements on graphs with hierarchical structure.” + typo “No Euclidean graph convolutional methods...”