A Survey on Knowledge Graph Embeddings with Literals: Which model links better Literal-ly?

Tracking #: 2336-3549

Genet Asefa Gesese
Russa Biswas
Mehwish Alam
Harald Sack

Responsible editor: 
Pascal Hitzler

Submission type: 
Survey Article
Knowledge Graphs (KGs) are composed of structured information about a particular domain in the form of entities and relations. In addition to the structured information KGs help in facilitating interconnectivity and interoperability between different resources represented in the Linked Data Cloud. KGs have been used in a variety of applications such as entity linking, question answering, recommender systems, etc. However, KG applications suffer from high computational and storage costs. Hence, there arises the necessity for a representation able to map the high dimensional KGs into low dimensional spaces, i.e., embedding space, preserving structural as well as relational information. This paper conducts a survey of KG embedding models which not only consider the structured information contained in the form of entities and relations in a KG but also the unstructured information represented as literals such as text, numerical values, images, etc. Along with a theoretical analysis and comparison of the methods proposed so far for generating KG embeddings with literals, an empirical evaluation of the different methods under identical settings has been performed for the general task of link prediction.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Federico Bianchi submitted on 07/Feb/2020
Major Revision
Review Comment:

---- SUMMARY -----

This paper propose a survey on knowledge graph embedding approaches that also take into consideration the fact that KGs contain literals. The authors not only present a comprehensive summary of the approaches, but they also experiment some (i.e., those for which models are available) on different tasks; providing insightful results on the state-of-the-art of these approaches. The knowledge graph embedding topic is getting much attention lately. Despite some issues (that are not directly related to the content) and some questions that I have, I really like the paper and I think it might become a valuable contribution.

SWJ Guidelines:

(1) Suitability as introductory text, targeted at researchers, PhD students, or practitioners, to get started on the covered topic.

Answer: There are fixes to do and I have some questions. Also, I suggest the authors to add some details on standard knowledge graph embeddings to make their work more accessible to a broader audience.

(2) How comprehensive and how balanced is the presentation and coverage.

Answer: Good review of papers. With also some experimental evaluations.

(3) Readability and clarity of the presentation.

Answer: Well written in general, equations should be aligned to a common scheme.

(4) Importance of the covered material to the broader Semantic Web community.

Answer: Very important, literals are less considered in standard knowledge graph embedding approaches.

---- REVIEW -----

This paper reviews that state-of-the-art knowledge graph embeddings by giving a closer look to those approaches that treat literals. I like this paper and I think it is really a nice contribution. I like the summaries at the end of each section as I think they provide a nice way to recap each section.

Points 1 and 2 in the Introduction motivate well why literal values are important and should be considered in embedding approaches.

I quickly skimmed over ref [57] authors' own work and I think the extension the authors have deeply extended their previous work.

Knowledge graph embedding approaches for literals that extend standard knowledge graph embedding inherit their limits. And I'd do a brief summary of the problems of knowledge graph embeddings and how this might affect the representation of literals. (page 19) line 16. DistMult does not model well anti-symmetric relations because of its score function. I'd say this in a previous section.

Notation changes along with the paper and it would be easier to have a consistent scheme of symbols and the experiment should probably be better described (see some points on these two things in later sections of the review).


Sometimes the notation is different for different methods. Is this intentional? e.g., Equation 2, 21, 27 and 28 seem to express very similar loss functions (that should be the original TransE loss). But one uses the square bracket positive notation []+ to get the positive parts and another one uses the max function. One uses the set S' for the negatives and one uses the set T' for the negatives. I'd try to align the notation to a common scheme where it is posssible.

Equation 8) what are s and s*?

Section 6.2, "Extended Dataset:" did you create this dataset? will you share this? I think it's a valuable contribution given the recent discussion on knowledge graph embedding evaluation [1]. "as mentioned by the authors in the paper" which paper? are you referring to TransEA? (you mention this a page later).

In the link prediction with numeric literals, are you trying to predict the exact numerical label? I wonder if it is correct to use HITS@k to evaluate the predictions. I know this is standard, but still, I'm not sure if it is the best way to do this (I'm not asking for an experiment, but if this is true it might be worth discussing it in the paper -i.e., predicting 15 instead of 16 is different than predicting 1021 -).

(page 15, line 12): in the evaluation procedure section you should probably add on which list of results you compute the measures (e.g., the ordered set of corrupted triples and you look for the rank of the correct one). Moreover, the filtering setting should also remove the other correct triples from the ranking list if I remember correctly.

There is also probably the need to further describing the extended datasets. Is it well balanced?

I'd also name these datasets differently if they are an extended version of FB15K and FB15K-237, as reading the paper I was getting confused because I was thinking about the old FB15K and FB15K-237. Table 3 shows the details for the standard FB15K* datasets, I'd also include those that come from extended datasets.

Page 16) "Experimental Setup", embeddings of 100 and 200 dimensions. Why these two? is this because sometimes you use a ComplEx-based model that uses complex numbers and they thus require 2xN dimensions to be represented?

Since you already run lots of experiments, could you also had some analysis on the runtime of the algorithms? under a more "production-related" point of view, this might be very interesting.

"Note that the reason for DistMult-LiteralEg model to achieve the best result on FB15K-237 dataset is the fact that this dataset does not have any symmetric relation. " why? shouldn't DistMult favor symmetric relationships? am I missing something here? please correct me if I'm wrong. You also mention that "FB15K-237 achieves slightly better result compared to FB15K" is this because of the fact that the dataset has been extended? Because in structured knowledge graph embeddings [2] FB15k-237 is the one that is more difficult to solve. I suggest you to add more details in this section because I think it is important.

When you train TransEA, do you re-normalize values (since you have retrieved the original ones) or keep the original ones? if the latter is the case, how does this impact the models?

Are there specific categories of literals that are easier to predict than others? It would be nice, if you can generate it, to have a small summary table on this (but I guess it will be biased by the proportion of this information in the training data).


Table 1 shows a really good summary of the various models in the field by also defining the categories. I wonder if you could add some text to describe this table. Otherwise, just looking at the table, I do not get the difference between translational models and bilinear/the others. In my opinion, without some text that explains the categorization it might be not really informative and the categorization could become: 1) models that do not use literals, models that use literals.

I understand the paper focuses on literals, but I'd extend a bit the Introduction or the Related Work Sections by also explaining how approaches that do not focus on literal work.
It'd be easier for a reader to understand the paper if there was a short introduction on kg embeddings in general (e.g., it might be better to introduce what negative sampling and scoring function are in the context of KG embeddings). You could explain briefly explain some properties of TransE (and its limits, like 1-N representations), this might come useful because at (Page 6, line 13) you mention that DKRL is an extension of TransE. The same goes for EAKGAE.

On the same line, it might not be too clear to someone new to the field what a "complex conjugate" is, but I'm not sure how much you can give details about this. I'd just underline the fact that some of these methods are often mapped in different spaces (i.e., R and C).

This paper might be of interest to you [1] and I suggest the author cite this in their paper (RESCAL, in this paper, is reported as one of the models that is competitive and that achieves good performances ).

In section 7, I'd restate the research questions and I'd answer them.

page 2, line 22. "handling different challenges." => I'd name a few of those challenges here

page 3, line 23. I cannot understand the sentence "... has been given with experiments conducted ..."

It would be interesting to see some predictions of the models in the paper. Can you add some examples of the prediction of a literal for an entity?


The results from the ComplEx-LiteralEg model shows => show

"some approaches have been proposed which incorporate the information underlying literals to generate KG embedding" => could you rephrase this sentence?

[1] https://openreview.net/forum?id=BkxSmlBFvr

[2] https://arxiv.org/pdf/1902.10197.pdf

Review #2
Anonymous submitted on 11/Feb/2020
Minor Revision
Review Comment:

The survey gives a comprehensive overview of approaches for knowledge graph embedding using literals. The authors identify the most relevant approaches introduced in recent years, and categorize them in 4 broader categories, based on the modalities they embed. Furthermore, the authors provide an overview of the applications of such embeddings, and provide an empirical evaluation on the task of link prediction on 2 standard datasets, i.e., FB15k and FB15K-237.
I believe the survey will be of a great value for all researchers in the Semantic Web community, especially for PhD students working in the field of knowledge graph embeddings.

* Comments:

- There are several important approaches that need to be added in the survey [1-4].

- Why is the complexity only discussed for approaches with numeric literals, but not the other approaches?

- While it is true that most of the approaches are originally evaluated only on the task of link prediction on the FB15K (and similar) datasets, it would be interesting to conduct a comparative evaluation of all the models on different tasks and different datasets, e.g., classification, regression, clustering, recommender systems etc. Such an evaluation would give better insights of the quality of the different embedding approaches. Such an evaluation framework can be found in [5].

- While the paper is well structured, in many cases the way some sentences are constructed makes it difficult to read the paper. The writing style should be improved, especially in the introduction, the conclusion and the discussion sections.

* Minor comments:
- Freebase doesn't exist as such anymore, thus it shouldn't be listed as a popular publicly available knowledge base.
- No knowledge base has billions of entities (as stated in the introduction), rather millions. E.g. Wikidata has 76.6M entities.

[1] T. N. Kipf and M. Welling, Variational Graph Auto-Encoders, NeurIPS Bayesian Deep Learning Workshop (2016).
[2] T. N. Kipf and M. Welling, Semi-Supervised Classification with Graph Convolutional Networks, ICLR (2017).
[3] Chen, Jie, Tengfei Ma, and Cao Xiao. "Fastgcn: fast learning with graph convolutional networks via importance sampling." arXiv preprint arXiv:1801.10247 (2018).
[4] Mousselly-Sergieh, Hatem, et al. "A multimodal translation-based approach for knowledge graph representation learning." Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. 2018.
[5] https://github.com/mariaangelapellegrino/Evaluation-Framework

Review #3
By Heiko Paulheim submitted on 12/Feb/2020
Minor Revision
Review Comment:

The paper introduces a survey on embedding techniques incorporating literals, an interesting and timely topic. The authors present both a theoretical overview as well as empiric analyses.

The paper itself is quite well thought through and composed. I have a few remarks, though, on the overall presentation and conclusions drawn from the analyses.

First of all, I'd like to see a more direct contrast between techniques with and without literals which directly correspond. The first point where to address this is Table 1, where I find it a bit odd that "Models using literals" is just another row in the table. I'd rather expect it to be a second column, so that translational, semantic matching etc. models are shown that use or do not use literals. It should also be made clear which approaches using literals are extensions of which basic models (e.g., the text states "DKRL extends TransE") in that table.

Second, I would like to see that comparison directly reflected in the empirical results. When showing the results for DKRL, they should be contrasted with TransE in table 4, in table 5+6, it would be interesting to see the original variants of DistMult, ComplEx, and ConvE side by side with their literal extensions, and so on.

I have made the effort of digging out some of the original results and contrasting them with the results presented in the paper. This contrast is in fact interesting, as it shows, e.g., that the LiteralE approaches do *not* seem to outperform the original DistMult, ComplEx, and ConvE implementations in many cases. This is not the authors' fault (since this is a survey article), but it is an interesting observation which should be reflected and discussed in the paper. I am curious to hear the authors' opinion about this.

Some further observations:
* While reading the introduction, I got the impression that only text literals are considered. It should be made clearer from the beginning that all kinds of literals are meant.
* The example in the introduction may not be the best one, as it heavily depends on the way gender is reflected in DBpedia (i.e., as a text literal). Maybe choosing another example (e.g., older vs. younger actors) might be more approriate.
* Not all approaches in mentioned in the text are contained in Table 1 (e.g., KGloVe).
* In section 3, the label of property wdt:2048 ("height") should be given. Moreover, the unit of measure is explicitly given in Wikidata, so it is unclear why the value is not simply canonicalized.
* In section 5, the experiments in RESCAL for tail prediction are described as "fixing the relation type to rdf:type" - isn't that actually entity classification then?
* Also in section 5, under "other applications", predicting a person's weight is used as an example for "predicting the values of (discrete) attributes in a KG"; I rather conceptualize weight as a continuous predicate.
* In section 6.2, the authors state that Extended RESCAL is computationally too expensive to be considered. I would like to see some more details here (e.g., what are the system/runtime requirements? or a statement like "did not finish within a week" etc.)
* For the extended dataset in 6.2, some more details would be appreciated, such as min/max/avg text length in characters/words, language (all English?), no. of distinct words, etc.
* In section 7, is is unclear to me why approaches should only use the year fraction of the three dates, instead of converting them to UNIX time and treating the date as a number literal.
* In appendix A, I assume that these are the use cases discussed in the original papers, not the use cases in which those techniques could be used *in principle*. This should be clarified.

Minor typos:
p. 2: "the model should be enough" - a word is missing between "be" and "enough"
p. 15: LiterlaE

Review #4
By Diego Moussallem submitted on 01/Mar/2020
Major Revision
Review Comment:

This paper fulfills all required criteria by SWJ to be a survey paper. The authors made a great effort in extending their previously published paper. The topic of this survey is a trend across several CS communities, mainly for the Semantic Web one, and is essential for all researchers working on this area. I think this survey is a significant contribution.

Overall, the paper is well-motivated and gives a helpful overview regarding KGE augmented with literals. Additionally, the authors performed their experiments on the surveyed approaches. However, some statements are wrong, and the overall presentation is a bit confusing and should be improved. Additionally, some important papers are missing and should be added in this survey [1,2,3,4,5]. Therefore, the paper requires some improvements before being accepted for publication.

My comments follow below:

1 - Introduction
It is clear, direct, and motivated. However, the following statement is wrong.
" However, most of these approaches, including the current state-of-the-art TransE [10], are structure-based embeddings which do not make use of any literal information i.e., only triples consisting of entities connected via properties are usually considered. "

TransE is not a SOTA approach anymore. Please fix it.

2 - Related work:

There the authors wrote:

" Different KG embedding techniques have been proposed so far which can be categorized as translation based models, semantic matching models, models incorporating entity types, models incorporating relation paths, models using logical rules, models with temporal information, models using graph structures, and models incorporating information represented in literals"

The authors do not provide any explanation of how the categories are defined. Based on what RESCAL, DistMult, HolE, and ComlplEx fall under the category of semantic matching models. For example, DistMult is a generalized framework of NTN.

It seems that the authors tried to summarize as much as possible the related work, and the result was very confusing. The categories and models could have been better organized and presented.

Is this section really necessary? There are different surveys about KGE, and herein, the authors analyze KGE models augmented with literals. I think the authors should focus only on KGE with literals, they have to mention the well-known KGE approaches, of course, but their focus is another. Maybe, this could go directly into the experiments, and they could highlight the contribution of literals in the experimented tasks.

3 - Problem formulation
clear, no comments.

4 - Knowledge Graph Embeddings with Literals

Page 6. right column, line 10, (Equation 2) the concept of corrupted triples is mentioned before defining it. The generation of corrupted triples has an impact on the learning process, and its methodology hence should be explained before.

4.2 - Models with Numeric Literals
In MT-KGNN, triplet, or triple? Be consistent.

The authors should have been consistent, is there any particular reason why the complexity was only presented for KGE augmented with numeric literals.

5. Application

Clear, no comments.

6 - Experiments on Link Prediction

Although the majority of KGE papers have conducted experiments on the link prediction task, it is not only the task that can show the quality of the generated embeddings. The authors could have performed a triple classification task. I expected to see experiments on the same applications mentioned in Section 5. Additionally, FB15K is a flawed dataset. I hence expected to see other datasets in the link prediction experiments such as WN18RR. Moreover, it would be interesting to evaluate the surveyed approaches on this benchmarking [6].
Furthermore, I missed a clear comparison among KGEs enriched with literals and the standard ones. The idea is to show that KGEs when enriched with literals, learn better representations of entities and therefore perform better than the other approaches. I did not see this.

6.2 - Evaluation Procedure and Results
The definition of filtered Hit@N and MRR are insufficient. Please elaborate more on it. Additionally, the authors stated the following.

"In case of Extended RESCAL, practically this method is computationally expensive and thus not considered as a feasible embedding model to incorporate literals. Moreover, none of the models with literals which are discussed in this paper consider Extended RESCAL in their experiments."

Based on what? What are the requirements? How expensive is it in terms of hardware? I expected to see time-performance experiments when I saw this statement.

7. Discussion.

This section starts with new concepts, in-KG, and out-of-KG, which have not been used throughout the paper. Introducing new concepts in the last chapter is not good. The authors could have introduced this concept in section 3. In addition, it would be interesting to see a discussion regarding the generation of corrupted triples for learning counter-examples. Is it a direction to go?


Please revise the entire paper. I found several wordy sentences that are difficult to read.
Be consistent with the equations.

[1] Jointly:ZhenWang, Jianwen Zhang, Jianlin Feng, Zheng Chen. Knowledge Graph and Text Jointly Embedding. In EMNLP 2014.
[2] Jointly2: Huaping Zhong, Jianwen Zhang, Zhen Wang, Hai Wan, and Zheng Chen. Aligning knowledge and text embeddings by entity descriptions. In EMNLP, pages 267–272, 2015
[3] TEKE: ZhigangWang and Juanzi L. Text-Enhanced Representation Learning for Knowledge Graph. In IJCAI 2016.
[4] SSP: Han Xiao, Minlie Huang, Xiaoyan Zhu. SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions. In AAAI 2017.
[5] Mousselly-Sergieh, Hatem, et al. "A multimodal translation-based approach for knowledge graph representation learning." Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics. 2018.
[6] Pellegrino, Maria Angela, et al. "A Configurable Evaluation Framework for Node Embedding Techniques." European Semantic Web Conference. Springer, Cham, 2019.