Knowledge Graph Embedding for Data Mining vs. Knowledge Graph Embedding for Link Prediction - Two Sides of the Same Coin?

Tracking #: 2726-3940

Jan Portisch
Nicolas Heist1
Heiko Paulheim

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Full Paper
Knowledge Graph Embeddings, i.e., projections of entities and relations to lower dimensional spaces, have been proposed for two purposes: (1) providing an encoding for data mining tasks, and (2) predicting links in a knowledge graph. Both lines of research have been pursued rather in isolation from each other so far, each with their own benchmarks and evaluation methodologies. In this paper, we argue that both tasks are actually related, and we show that the first family of approaches can also be used for the second task and vice versa. In two series of experiments, we provide a comparison of both families of approaches on both tasks, which, to the best of our knowledge, has not been done so far. Furthermore, we discuss the differences in the similarity functions evoked by the different embedding approaches.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 20/Apr/2021
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper presents a study about the use of various knowledge graph embeddings in scenarios with different task definitions that differ from those for which they were developed. In particular, the authors study how rdf2vec embeddings perform for link prediction tasks, and how TransE and its evolutions perform for classification tasks. The paper is overall well-written and it is pleasant to read. The motivations are clear and it addresses an interesting topic.

However, there are some points which must be improved:
- In the end of the related work section, authors say that the comparison between the two approaches are rare. If they know other comparisons, they are strongly invited to report them in the paper and to discuss why the work presented is different
- In section 3 the authors reported much information about recommendation. Although the task is related to both classification and link prediction, I found it a bit misleading since the evaluated tasks are in the end classification, regression, and clustering; no evaluation about a pure recommendation is shown.
- The labels in figure 8, 9, and 10 are not readable. Fig. 7 should be a table.
- The classification tasks are not well-defined. Did the authors perform binary, multi-class single-label, or multi-class multi-label classification? Why did the authors choose to use the accuracy and not the precision-recall-f1 scores? Are the datasets balanced?
- How are the entities linked to the knowledge graphs? Did the authors use string matching? Are there ambiguous entities? If yes, how did the authors deal with them?
- What does “the dataset is based on LP50” or “the dataset is based on KORE dataset” mean? Authors should describe the dataset, and say which operations or transformations were performed.
- I did not get the sentence “The same property of also assigning closer embedding vectors to...”. Please rephrase it.
- Section 5.1 should be split in subsections. The current section mixes evaluation settings with the discussion of the results and, therefore, it is not easy to read.
- Table 3 is useless for readers who do not know most of the people in the table. I strongly encourage the authors to provide another example or to explain better in the text which the correct and wrong predictions are and why (i.e., providing the necessary knowledge to understand the table).
- Many references are from ArXiv. I strongly suggest correcting them so that they refer to the published version of the peer-reviewed papers.
- The provided github repository is not complete and the reviewer was not able to run and test the experiments. I would ask the authors to complete it.
- Which are the interesting developments that authors forecast? They should be mentioned and briefly presented.
- A discussion about limitations should be included. I wonder if the results are biased by the fact that only DBpedia 2016-10 has been used. What can happen with another knowledge graph is not taken into account.

The idea of comparing embeddings built for different tasks is interesting. However, there are not new technologies or methods that have been provided. The paper is probably inspired by
“Lavrač, N., Škrlj, B., & Robnik-Šikonja, M. (2020). Propositionalization and embeddings: two sides of the same coin. Machine Learning, 109(7), 1465-1507.” which does not cover the presented comparison work.

Significance of the results
In this section there is the main contribution of the paper. Discovering the different representations might be used for tasks that differ from those that are built for is an interesting direction which might lead to improve current machine learning and link prediction methods.

Quality of writing
The quality of the paper is good. In my review I raised some points which should be able to improve the paper further.

Overall, the paper helps to understand and evaluate how symbolic knowledge is transformed into sub-symbolic knowledge, and how we can make use of it. I think this might be a good paper but the presentation must be improved, more details about the performed experiments should be reported, and more aspects about the impact should be discussed.

Review #2
By Paul Groth submitted on 24/May/2021
Major Revision
Review Comment:

This paper compares two of the major use cases of Knowledge Graph Embeddings in the literature. One is the creation of embeddings optimized for the task of link prediction, the other is creation of embeddings are representations for use in downstream tasks that rely on the similarity of the computation. The aim of the paper is to see to what extent these approaches to creating embeddings are useful across tasks and to see how and if they differ. The paper provides extensive experiments.

Overall, I really liked the premise of the paper. In general the idea of learning good representations that can handle different tasks is from my perspective important (e.g. [1]). Comparing these dominate approaches to representations generated is a strong place to start. I felt like the paper provided a good introduction to the major methods and the description was quite accessible. Furthermore, the paper provides a number of useful conclusions about when to use of RDF2Vec in particular. I was hoping for some stronger conclusions about its application in the link predication setting.

I think the paper could be significantly improved by including another knowledge graph embedding technique for data mining tasks (KGlove, entity2vec, maybe even an R-GCN). The link predication approaches compare a number of different approaches. I think by incorporating another method for creating a "data mining" representation the conclusions would be more generalizable.

Additionally, I think the related work is missing a larger discussion of the use of language models in context of learning knowledge graph representations (e.g. [2,3]). This is becoming a very interesting and high-performance way go getting good representations of KGs for both link prediction but also for down stream tasks.

Some minor comments
- Better visualization of Figure 3 and in general the pictures could be improved for readability.
- It would be nice in Table 1 identify which algorithms are "link prediction" and "data mining" algorithms.
- In the related work section, the discussion of neural networks seems a bit off graph neural networks are different than the other training approaches described.
- p. 2 Line 45 "the systematic of those reviews" - not clear what this means
- p. 2 Line 4 you say "embeddings from data mining" I think you mean "embeddings for data mining"
- You use the phrase "as good as possible" throughout, this should be "as well as possible"

[1] Inductive Entity Representations from Text via Link Prediction. Daza, Daniel, Cochez, Michael, and Groth, Paul In Proceedings of The Web Conference 2021

[2] Ruobing Xie, Zhiyuan Liu, Jia Jia, Huanbo Luan, and Maosong Sun. 2016. Repre- sentation Learning of Knowledge Graphs with Entity Descriptions. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, February 12-17, 2016, Phoenix, Arizona, USA, Dale Schuurmans and Michael P. Wellman (Eds.). AAAI Press, 2659–2665. view/12216

[3] Xiaozhi Wang, Tianyu Gao, Zhaocheng Zhu, Zhiyuan Liu, Juanzi Li, and Jian Tang. 2019. KEPLER: A Unified Model for Knowledge Embedding and Pre- trained Language Representation. CoRR abs/1911.06136 (2019). arXiv:1911.06136

Review #3
Anonymous submitted on 03/Jul/2021
Review Comment:

In this paper, the authors perform an analysis to assess the advantages and drawbacks of the two main strands of knowledge graph embeddings: i) link prediction, e.g., TransE and its descendants, and ii) embedding from data mining, e.g., RDF2Vec.
Eventually, these two families of technologies have the same goal: assigning numbers to entities. However, the authors argue: are they related? Can they be used interchangeably?
To this end, they trained and compared a set of algorithms (RDF2vec, TransE, TransR, RotatE, DistMult, RESCAL, and ComplEx) from the two families, on standard datasets (WN18, FB15k, LP50, KORE and others), on six different tasks: classification, clustering, regression, semantic analogies, document similarity and entity relatedness.
From such experiments, the authors derive some insights on how the two families of methods differ from each other and propose a few recommendations on the kind of approach to use based on the user needs.

First of all, I would like to congratulate the authors because this paper is written very well. It is straightforward to read, and the math jargon does not weigh down the paper.

I believe that the paper is mature enough with regard to the message it wants to convey.

However, I deal with a set of knowledge graphs that are quite challenging to treat: scholarly knowledge graphs. Specifically, the KGs I am referring to, have several N to M relations with N >> M. This happens, when the cardinality of the entities in the head position for a certain relation is much higher than the one of the entities in the tail position. Indeed, scholarly knowledge graphs tend to categorize millions of documents (e.g., papers, patents) according to a relatively small set of categories (e.g., topics, affiliation kinds, countries, chemical compounds). The set of entities in the head is very heterogeneous, but the set of entities in the tail is homogeneous.
Our challenge is that state-of-the-art KGE models lack the ability to handle effectively these kinds of relations since they are unable to assign to each entity a well distinct embedding vector in a low dimensional space. You can probably guess that link prediction techniques that exploit these embeddings tend to perform poorly.
Do you have any insight on how to deal with such peculiar KGs?
In general, I think that while your analysis covers the majority of KGs, there could be some niche set-ups that are not covered and would be interesting to add.

- Equation 4 and 5. The sign at denominator |…| denotes the cardinality of the triples. But at the first glance, I thought it was the absolute value and I was slightly confused. Please, add a note.
- Fig. 4 is never referenced in the text.
- Fig. 2, 8, 9, 10. Can you make the label slightly bigger? They are hard to read.

Originality 4/5
Significance of the results 4/5
Quality of writing 5/5

The authors did not provide any URL

Review #4
Anonymous submitted on 12/Jul/2021
Major Revision
Review Comment:

This paper contrasts graph embedding approaches designed for link prediction against a RDF2Vec, which here is referred to as being representative of "embeddings from data mining". The theoretical analysis is limited to TransE and RDF2Vec. The empirical analysis considers further methods.

Originality and Significance:

The comparison considered in the paper appears worth studying.

Unfortunately, however, the paper only studies very old methods from the period 2014-2016. It would have been interesting to see results on the numerous newer methods that have since become popular.

Highly popular methods such as ConvE and R-GCN from 2018 are not even mentioned at all. This severely diminishes the value of the paper.

In light of such models, the contrast between embeddings from link prediction and "embeddings from data mining" is not as clear-cut. Methods such as R-GCNs were shown to work well on both link prediction and entity classification even in the original paper.

In the experiments, the paper studies to what extent link prediction embeddings can be used for classification and similarity and to what extent RDF2Vec vectors can be used for link prediction (based on the simple assumption of finding an additive relation embedding that translates from head to tail entity, similar to TransE). The experiments reveal some interesting differences between classic link prediction approaches from around 2014-2016 and RDF2Vec, such as the poor ability of many methods to cope with n:m relations. Also, RDF2vec empirically seems to capture relatedness rather than pure similarity. However, one wonders how more recent approaches perform, or even just the popular approaches from 2018.

Comments regarding particular claims:

In the theoretical analysis, the paper focuses on presenting RDF2Vec as a method that yields entity embeddings capturing entity similarity. This intuitively makes sense, given how the embeddings are learned, though the original RDF2Vec focused on the use of such embeddings as an input to a feature vector. Section 3.1 presents some arguments for why the two are closely related, but the claims do not always hold in practice, as the weights of a model may hugely amplify small differences in the features of two entities, such that they end up getting vastly different classifications. Similarly, many other differences in the features can get ignored completely. This is not just theoretical but occurs very often in practice. For example, if we train a model to classify the age of scientists, the model is likely able to discriminate between different age groups based on relevant attributes, but the actual embedding similarities will be quite different, primarily reflecting similarities based on the scientific field, affiliation etc. This is also why [CLS] representations from BERT, for example, are highly suitable for classification but perform very poorly in terms of similarity.

Eq. (21) to (24) are all formalized as entailments, but they all appear to be false in the sense of not holding true in general. Whether the entity embeddings end up genuinely being similar depends on how many other relationships are shared or not shared. This is similar to how in word2vec, just sharing one common context word is not enough to make two word embeddings similar. In fact, even Eq. (19) are (20) for link prediction are also false if we assume a model trained on numerous different relations is being considered, as the other relations may overpower the relation r .

Minor comments:

The introduction is a bit confusing, as the reader gets the impression that only TransE and RDF2Vec are going to be compared, although later in the experiments more approaches are considered.

Fig. 1: It seems unnecessary to include a figure just to deliver citation metrics showing the importance of an area. It is fairly clear that there is enormous interest in this topic, and any person who would be motivated enough to read this paper would not really need citation metrics, which come with their own set of flaws, to accept this claim.

The papers uses references as noun phrases, e.g. "In [18]", which should be avoided.

Section 4.2: "As discussed above, positioning similar entities close in a vector space is an essential requirement for using entity embeddings in data mining tasks."
-- I highly disagree with this statement, as argued above. The best counter-example is BERT. Of course, two input embeddings that are similar are likely to get similar predictions, unless it is near a decision boundary, but the converse does not hold at all.

"However, there in RDF2vec, similarity can also come in other notions.":
-- grammar mistake?