A Survey on Knowledge Graph Embeddings with Literals: Which model links better Literal-ly?

Tracking #: 2475-3689

Genet Asefa Gesese
Russa Biswas
Mehwish Alam
Harald Sack

Responsible editor: 
Pascal Hitzler

Submission type: 
Survey Article
Knowledge Graphs (KGs) are composed of structured information about a particular domain in the form of entities and relations. In addition to the structured information KGs help in facilitating interconnectivity and interoperability between different resources represented in the Linked Data Cloud. KGs have been used in a variety of applications such as entity linking, question answering, recommender systems, etc. However, KG applications suffer from high computational and storage costs. Hence, there arises the necessity for a representation able to map the high dimensional KGs into low dimensional spaces, i.e., embedding space, preserving structural as well as relational information. This paper conducts a survey of KG embedding models which not only consider the structured information contained in the form of entities and relations in a KG but also the unstructured information represented as literals such as text, numerical values, images, etc. Along with a theoretical analysis and comparison of the methods proposed so far for generating KG embeddings with literals, an empirical evaluation of the different methods under identical settings has been performed for the general task of link prediction.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Petar Ristoski submitted on 30/Apr/2020
Review Comment:

With the new and significantly improved version of the survey, all my comments and concerns have been resolved.

Review #2
By Federico Bianchi submitted on 20/May/2020
Review Comment:

SWJ Guidelines:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

Answer: Work is accessible to a broad audience, even if the focus is on literals.

(2) How comprehensive and how balanced is the presentation and coverage.

Answer: Good and extensive review of papers, with experimental evaluations and details on runtimes.

(3) Readability and clarity of the presentation.

Answer: Well written and clear.

(4) Importance of the covered material to the broader Semantic Web community.

Answer: Very important for the semantic web community.

Thanks to the authors for the careful work on the paper. All my comments and suggestions were taken into consideration and applied when needed. Also, many thanks for explaining to me those things that I've misunderstood while reading the paper. I read the other reviews and the responses and after reading again the paper I think that it is in a very good shape. As I said in my previous review I think this is a significant contribution and of interest for the SW community.

Review #3
By Diego Moussallem submitted on 02/Jun/2020
Review Comment:

The authors addressed all my comments and extended the previous version substantially. Although the addition of a triple classification task in the experiments would be very valuable to this survey, I agree that the link prediction is a common task among all published papers. Therefore, I recommend the acceptance of this survey.

I have only three very minor points.

1) In the abstract, I would change the following passage below to make clear that the authors are focusing on literals contained in a given KG and not from external data.

...entities and relations in a KG but also the unstructured
information represented as literals such as text, numerical...


...entities and relations in a KG but also **its** unstructured
information represented as literals such as text, numerical...

2) In "The categories are translation based models, semantic matching models, models incorporating entity types, models incorporating relation paths, models using logical rules, models with temporal information, and models using graph structures." and Table 1 as well, is it necessary to have "Models using" all the time? The authors could find a better solution.

3) Point to this work, https://arxiv.org/pdf/1911.03903.pdf, as a possible new evaluation method in the discussion section. It was recently accepted at ACL 2020 and has shown some drawbacks with current evaluation methods for KGEs.

Review #4
By Heiko Paulheim submitted on 18/Jun/2020
Review Comment:

I am happy to see that the authors have addressed all the points I raised very thoroughly. Thank you for the hard work you put into this!

I have just one more small remark: when it comes to combining information from the graph and other sources, i.e., images and text literals, one very simple, yet potentially effective approach is to simply generate embeddings for each of the modalities, and then combine the vectors by simple concatenation and further reduction, e.g., by PCA or a neural autoencoder. This has been demonstrated in [1]. I'm not sure whether it is applicable, e.g., on FB15k, but it would be worth mentioning.

[1] Thoma et al.: Towards Holistic Concept Representations:Embedding Relational Knowledge, Visual Attributes, and Distributional Word Semantics. ISWC 2017