A Survey on Visual Transfer Learning using Knowledge Graphs

Tracking #: 2730-3944

Sebastian Monka
Lavdim Halilaj
Achim Rettinger

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Survey Article
The information perceived via visual observations of real-world phenomena is unstructured and complex. Computer vision (CV) is the field of research that deals with the visual perception of the environment. Recent approaches of CV utilize deep learning (DL) methods to learn and infer latent representation from observational image data. To achieve a high accuracy, DL methods requires a huge amount of labeled images organized in datasets. These datasets may be scarce and incomplete in some domains, leading to an increasing amount of research aimed at augmenting DL approaches with auxiliary information. In particular, language information, which is freely available in large amounts on the internet, is in the focus of research and has shaped several deep transfer learning approaches in recent years. Language information heavily depends on the statistical correlations among the collected words which exist within a particular corpus. However, this learned representation, is unpredictable and cannot be adapted, making it difficult to use in specific domains. On the other hand, knowledge graphs (KG) show great potential in formalizing and organizing large-scale unstructured information. These KGs, engineered by domain experts, can be easily adopted to perform various tasks in specific domains. Recently, methods have been developed that transform KGs, in vector-based embeddings so that they can work directly in combination with deep neural networks (DNN). In this survey, we first describe different modeling structures of a KG, such as directed labeled graphs, hyper-relational graphs, and hypergraphs. Next, we explain the structure of a DNN, which consists of a prediction task and a visual feature extractor or a semantic feature extractor, respectively. Furthermore, we classify KG-embedding methods as semantic feature extractors and provide a brief list of these methods and their usage according to respective modeling structure of a KG. We also describe a number of joint training objectives suitable to operate on high dimensional spaces. The respective definitions of tasks for transfer learning and transfer learning using knowledge graphs are presented. Next, we introduced four different categories on how transfer learning can be supported by a knowledge graph: 1) Knowledge graph as a reviewer; 2) Knowledge graph as a trainee; 3) Knowledge graph as a trainer; and 4) Knowledge graph as a peer. We also provide an overview of generic KGs and a set of datasets and benchmarks containing images with or without additional information such as attributes or textual descriptions, with the intention of helping researchers find meaningful evaluation benchmarks. Last, we summarize related surveys in the field of transfer learning and deep learning using additional knowledge, and give an outlook about challenges and open issues for future research.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 26/Apr/2021
Review Comment:

The paper provides a survey of transfer learning methods which combine Deep Learning with Knowledge Graphs to tackle Computer Vision problems. The adoption of Hybrid and Neurosymbolic approaches to AI and the advancement of the Semantic Web are closely-knit. Hence, a survey with this focus is very timely and relevant to the Semantic Web community. This manuscript, however, is not ready for publication in its present form. It will need to be significantly revised and completed to ensure that the proposed overview is comprehensive and accessible to the non-expert.

In Section 2, the authors should accompany claims about advances on specific research areas (e.g., “More recently,attempts have been made to independently pre-trainthe feature extractor...” at line 38) with the relevant citations.

The structure of Section 2.2. is cryptic. The distinction between visual feature extractors and semantic feature extractions is more straightforward, but should be emphasised through an introductory sentence before subsections 2.2.1 and 2.2.2. Then the structure of the paragraphs Knowledge Graph Embeddings (KGE), Entity embeddings, and others, is confusing: entity embeddings and directed graph embeddings are a type of KGE, so they should not appear with the same level of heading as KGE.

Section 2.2 is also oddly placed, because it presents highly-specialised related works in KGE before more general definitions around the topics of Deep Learning and Transfer Learning are presented (in Sections 2.3 and 2.4). Moving from Section 2 to 3, the focus goes from specific to general. These sharp changes of focus hinder the readability of the paper and require multiple passes to guess the red threads linking sections together.

The presentation of related works in Section 2.2.2 is overly concise and the main intuitions behind each work should be further explained so that the paper is more self-contained. For example, at line 42: “TransE [17] use the translational distance to model a relation as a vector-plus operation between two entities”, what are the translational distance and vector-plus operations? Similar considerations apply to the other cited methods.

Section 2.5 is simply a list of research questions, however each question should be further explained and contextualised, especially with respect to motivating which insights or gaps in the literature inspired these questions. More broadly, the links between Sections 2 and 5 should be made explicit. For example, none of the related categorisations of transfer learning methods in Section 5, e.g., the ones by Pan et al. [118] and Zhang et al. [119], are followed when introducing Transfer Learning in Section 2.4. As a result, these Sections appear as isolated lists of concepts, rather than critical reviews which, together, contribute to substantiating a series of overarching arguments.

My main concerns have to do with the categorisation proposed in this paper. The main intended contribution of the paper is the introduction of four categories describing ways to combine KG with DL (knowledge graph as a reviewer, as a peer, etc.). However, the difference between the four proposed categories is not clear enough to uniquely assign the existing methods to each group. In the “KG as a reviewer” case, the DNN can be an input to either a KG or to a Graph-based Network. On the one hand, this definition overlaps with the taxonomy presented by Aditya et al, (2019), whose survey is also cited by the authors of this paper. Indeed, although the similarities are not acknowledged in this paper, Aditya et al. (2019) already described how auxiliary knowledge can be integrated at four different levels of a Deep Neural Network: (i) ahead of the DNN, (ii) within the intermediate layers of the DNN, (iii) as schema driving the Network topology, or (iv) in post-processing. As such, the “KG as a reviewer” pattern can be seen as an instance of the post-processing knowledge integration approach.
On the other hand, methods under the “KG as a reviewer” are a mix of methods where the training is performed only ahead of integrating the KG and methods where KG-based Networks are used for end-to-end training. Thus, the distinction between “KG as a reviewer” and “KG as a trainee” cases is blurred. As such, the proposed taxonomy fails to meet some basic requirements: (i) being clearly-defined across clear-cut categories, to provide an intuitive way for practitioners in the field to compare different methods; (ii) explicitly mapping the newly-introduced categories to the existing taxonomies, to provide evidence as to why a new taxonomy is needed and to situate the reader.

As a minor remark, the last paragraph of the introduction, which illustrates the structure of the paper, is redundant and almost repeats the same sequence clarified in the earlier paragraphs. I suggest removing it to make the introduction more concise.

Review #2
Anonymous submitted on 25/Jun/2021
Minor Revision
Review Comment:

The paper "A Survey on Visual Transfer Learning using knowledge Graphs" is a survey where authors describe relevant modeling structures of a KG. Then they explain the structure of a DNN and classify KGE-methods as semantic feature extractors and provide a brief list of these methods and their usage according to the respective modeling structure of a KG.
The paper is well written, structured and technically sounds.
The Introduction section clearly states the contributions of the paper.
The related work gives a nice illustration of what a knowledge graph is and is classified, explains what a deep neural network is.
The authors might want to add the following reference to the background section:

Section 3 nicely describes the transfer learning using knowledge graphs and discusses four different roles of the knowledge graph.
Section 4 lists datasets and benchmarks for transfer learning whereas section 5 includes surveys dedicated to either transfer learning and knowledge machine learning.
Finally, in section 6 the authors include challenges and open issues derived by the integration of knowledge graphs into machine learning pipelines.

Review #3
Anonymous submitted on 25/Jun/2021
Major Revision
Review Comment:

The paper is a survey of visual transfer learning approaches that rely on knowledge graphs.
It classifies methods in this space according to four main categories: 1) knowledge graph as a reviewer, 2) knowledge graph as a trainee, 3) knowledge graph as a trainer, and 4) knowledge graph as a peer. It also presents an overview of the KGs and datasets that can be adopted in this field.

The paper is fairly written and quite relevant to the special issue. It is an interesting submission but needs more work in particularly regarding some classification choices that need to be better justified (e.g., methods based on word embedding categorized as based on knowledge graphs) and the lack of details in some sections (e.g., 2.2, 2.3., 2.5).
Another significant issue is that the paper gives no details or inclusion criteria about how the surveyed papers were selected. Did the authors use any specific query or tool to verify the completeness of the chosen set? What was the procedure used for the selection? This needs to be discussed and clarified, also for the sake of reproducibility.

In the following, I will comment on specific sections.

Section 2.2
The related work on embeddings needs to be extended in order to give a more comprehensive representation of the different methods. Currently, most of them are characterized only by a brief sentence. I also suggest to add a citation for each mentioned architecture rather than repeatedly referring to [1].

Section 2.2.2
The authors should clearly define what they mean by “semantic features”. Is everything extracted from a KG or a structured source a semantic feature? I suggest introducing some examples of semantic features and how they are used for the relevant tasks.

“Entity Embedding” and “Directed Label Graph Embeddings” seem sub-categories of KGE methods. For example, TrasE and ConvE are usually considered KGE models. The authors should either produce a strong justification about why they categorize them in a different category or reframe the section.

Section 2.3.
There are a lot of different losses for KGE in addition to the three presented here. The authors need to justify why those three are presented and possibly add some other solutions. Some examples are:
- Margin Ranking Loss: Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J. and Yakhnenko, O., 2013, December. Translating embeddings for modeling multi-relational data. In Neural Information Processing Systems (NIPS) (pp. 1-9).
- Limit-based Scoring Loss: Zhou, X., Zhu, Q., Liu, P. and Guo, L., 2017, November. Learning knowledge embeddings by combining limit-based scoring loss. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management (pp. 1009-1018).
- Soft Margin Loss (SML): Nayyeri, M., Vahdati, S., Zhou, X., Yazdi, H.S. and Lehmann, J., 2020, May. Embedding-based recommendations on scholarly knowledge graphs. In European Semantic Web Conference (pp. 255-270). Springer, Cham.
- Full Multiclass Log Loss (FMLL): Lacroix, T., Usunier, N. and Obozinski, G., 2018, July. Canonical tensor decomposition for knowledge base completion. In International Conference on Machine Learning (pp. 2863-2872). PMLR.

Section 2.5
This section simply lists a set of research questions, without a discussion or a explanation about how the paper intends to address them. This section needs to be rewritten, clarifying why these are important questions and how and in which sections they will be addressed. Possibly, the research questions need to be used to drive the discussion. In the current version it appears that they are just stated and then forgotten.

Section 3
I am a bit confused by the inclusion of methods based on word embeddings under categories such as “Knowledge Graph as a Trainer”. While I understand that the presence of these methods means that they could be potentially applied to KG, many of them are currently not. It may be more useful to revise the categories by clearly distinguish the set of methods that *actually* use a KG from the ones that *may be adapted* to use it in the future. If well done, this may even become a strength of the paper and suggest some interesting extensions to current methods.

Section 5. Evolving Knowledge.
Here I would briefly refer to the growing area of knowledge graph construction. In particular, I would mention KG mapping languages and information extraction methods for KG generation. Some references:
Dimou, A., Vander Sande, M., Colpaert, P., Verborgh, R., Mannens, E. and Van de Walle, R., 2014, January. RML: a generic language for integrated RDF mappings of heterogeneous data. In Ldow.
Dessì, D., Osborne, F., Recupero, D.R., Buscaldi, D. and Motta, E., 2021. Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain. Future Generation Computer Systems, 116, pp.253-264.
Kertkeidkachorn, N. and Ichise, R., 2018. An automatic knowledge graph creation framework from natural language text. IEICE TRANSACTIONS on Information and Systems, 101(1), pp.90-98.

In conclusion, it is a potentially interesting article, but the current version presents some issues that require a fair amount of work. Therefore, I suggest a Major Revisions.

Minor remarks
Section 4.1 “are built” > “were built”
Section 5 “[120] separated the field” > Add the name of the authors.