Review Comment:
This paper contrasts graph embedding approaches designed for link prediction against a RDF2Vec, which here is referred to as being representative of "embeddings from data mining". The theoretical analysis is limited to TransE and RDF2Vec. The empirical analysis considers further methods.
Originality and Significance:
The comparison considered in the paper appears worth studying.
Unfortunately, however, the paper only studies very old methods from the period 2014-2016. It would have been interesting to see results on the numerous newer methods that have since become popular.
Highly popular methods such as ConvE and R-GCN from 2018 are not even mentioned at all. This severely diminishes the value of the paper.
In light of such models, the contrast between embeddings from link prediction and "embeddings from data mining" is not as clear-cut. Methods such as R-GCNs were shown to work well on both link prediction and entity classification even in the original paper.
In the experiments, the paper studies to what extent link prediction embeddings can be used for classification and similarity and to what extent RDF2Vec vectors can be used for link prediction (based on the simple assumption of finding an additive relation embedding that translates from head to tail entity, similar to TransE). The experiments reveal some interesting differences between classic link prediction approaches from around 2014-2016 and RDF2Vec, such as the poor ability of many methods to cope with n:m relations. Also, RDF2vec empirically seems to capture relatedness rather than pure similarity. However, one wonders how more recent approaches perform, or even just the popular approaches from 2018.
Comments regarding particular claims:
In the theoretical analysis, the paper focuses on presenting RDF2Vec as a method that yields entity embeddings capturing entity similarity. This intuitively makes sense, given how the embeddings are learned, though the original RDF2Vec focused on the use of such embeddings as an input to a feature vector. Section 3.1 presents some arguments for why the two are closely related, but the claims do not always hold in practice, as the weights of a model may hugely amplify small differences in the features of two entities, such that they end up getting vastly different classifications. Similarly, many other differences in the features can get ignored completely. This is not just theoretical but occurs very often in practice. For example, if we train a model to classify the age of scientists, the model is likely able to discriminate between different age groups based on relevant attributes, but the actual embedding similarities will be quite different, primarily reflecting similarities based on the scientific field, affiliation etc. This is also why [CLS] representations from BERT, for example, are highly suitable for classification but perform very poorly in terms of similarity.
Eq. (21) to (24) are all formalized as entailments, but they all appear to be false in the sense of not holding true in general. Whether the entity embeddings end up genuinely being similar depends on how many other relationships are shared or not shared. This is similar to how in word2vec, just sharing one common context word is not enough to make two word embeddings similar. In fact, even Eq. (19) are (20) for link prediction are also false if we assume a model trained on numerous different relations is being considered, as the other relations may overpower the relation r .
Minor comments:
The introduction is a bit confusing, as the reader gets the impression that only TransE and RDF2Vec are going to be compared, although later in the experiments more approaches are considered.
Fig. 1: It seems unnecessary to include a figure just to deliver citation metrics showing the importance of an area. It is fairly clear that there is enormous interest in this topic, and any person who would be motivated enough to read this paper would not really need citation metrics, which come with their own set of flaws, to accept this claim.
The papers uses references as noun phrases, e.g. "In [18]", which should be avoided.
Section 4.2: "As discussed above, positioning similar entities close in a vector space is an essential requirement for using entity embeddings in data mining tasks."
-- I highly disagree with this statement, as argued above. The best counter-example is BERT. Of course, two input embeddings that are similar are likely to get similar predictions, unless it is near a decision boundary, but the converse does not hold at all.
"However, there in RDF2vec, similarity can also come in other notions.":
-- grammar mistake?
|