Review Comment:
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.
In this paper, the authors present a knowledge graph embedding approach based on the idea that each entity (and relation) should be embedded based on what is called the eigenstate and mimesis.
The eigenstate represent the intrinsic aspects of the entity, while the mimesis is representative of what the authors call exogenous properties.
The paper is well written, despite some minor errors in grammar, which could be corrected by intensive proofreading.
I do think there is some novelty in the paper, as the concept seems somewhat different as earlier approaches.
However, as I will elaborate below, a stronger argumentation is needed to convince a reader that the approach chosen is a) solving the right problems and b) actually solving them.
Further, there is an issue regarding the significance of the results. Many of the results obtained in this work are dominated by approaches which predate this submission.
Now, there are some results in which this paper still brings merit, and it might well be that the approach in this paper scales better to larger knowledge graphs as the currently dominating approaches.
Hence, I advise a major revision of this paper in which the author would investigate the behavior of their approach on larger graphs.
I will first go into some details of major issues I observed in the paper. The I will list smaller issues below.
1. As mentioned, the results in this paper are often dominated by other approaches. Exemplary is TransG [1], which dominates many of the results (others include KG2E, PTransE).
One might argue that there would not be a need to compare to that model as is is a generative model and not translation-based.
In my view, that argument is, however, void. To me, it seems that transG also has some of the same ideas in the background as the proposed approach, namely that having a translation identically for all the triples of the same relation is not necessarily a good idea. Therefore, in the revised version, an in depth comparison would be necessary to weed out the relation between these approaches.
The same goes for the already included related work. There is a short discussion establishing that TransE and TransR are special cases (parameterizations) of GTrans, but the relations to other TransX models is not elaborated.
2. The main two reasons put forward to why the approach is needed are that 1) current approaches underestimate the complexity of entities 2) a statement saying that something is wrong with the function which is typically optimized I am not sure I completely get what the problem is they want to point out, but I understand it to be that the relation vector (or matrix in some approaches) of a relation is not the same for all head-tail pairs. Now, the first argument is pretty vague and in my opinion not enough supported in the paper. The second one I agree with, and the discussion could be augmented with some examples.
All in all, there is some support for a new approach, but it should be argues clearer. Then, when presenting the approach is would be good to indicate explicitly how the model is solving the indicated issues. Especially, how it is doing it better compared to other approaches. For example, while reading, I started thinking that using normal TransE with double the vector length might be as good as having two vectors. Or that having a translation matrix would cover anything one can do with two translation vectors. Some of this becomes clearer in section 3, but a reader has to find this out independently, instead of a comparison being made explicitly.
3. I might be getting the wrong impression, but it seems to me that the model is represented more involved as its actual strength. Continuing from equation (6), I can do a rewrite as follows:
h = \alpha h_e + \beta (r_a h_a^T h_e)
h = \alpha h_e + \beta (r_a h_a^T) h_e
h = (\alpha + \beta (r_a h_a^T)) h_e
Hence, it seems like the whole first term, related to the 'abstract' vectors only results in a linear scaling of the 'eigenstate'.
This means that the whole abstract part is not able to change specific parts (dimensions) of the eigenstate, but only scale the whole vector.
It seems to me that this diminishes the power of the model greatly.
4. There is a comparison of the complexity of the approaches, however, a timing comparison is missing. I agree that the complexity analysis is essential, mainly because the graphs currently used are rather small. From experiments performed as part of my own work, I have observed that some of the approaches cited to in this paper will indeed take a very long time to complete, or not complete within a reasonable time on large graphs. Further, it is known that a (worst case) complexity analysis does not always give a complete view on the actual time needed to run the algorithm on real test cases. Hence, I would suggest to also provide a comparison between the approaches, regarding the time it takes to build the model of a given (large) dataset.
Besides these major issues, there are several other parts to pay attention to in the revised work.
* Besides the works mentioned above, there are also other works aiming to embed knowledge graphs. One branch of approaches are the ones related to (biased) RDF2Vec, which also scales to much larger KGs.
* When introducing NTN, mention explicitly where the model was first conceived.
* The sensitivity on the alpha and beta parameters could be investigated explicitly.
* In section 3.0, when introducing \delta', provide a forward reference to where it gets defined.
* In the text after eq. (9), it is unclear what the i-th entity e refers to. Is it only heads, tails or both?
* In section 3.1, you write "Moreover, SED has lower time complexity and can be applied in large-scale knowledge representation." It seems to me that you should try both measures and then say whether they are feasible. At the moment, it seems like only one was tried. Denying one of them for reasons of theoretical time complexity is not sufficient.
* I see the argumentation for changing 1/\sigma to W_r as unnecessary. Writing it as W_r is just easier for notation. The fact that this also helped you to do some (preliminary?) optimization is not important.
* In equation 19, there are some '-signs missing in the second summation. Besides, it would be fair to mention the source of inspiration for this cost function explicitly.
* I am confused by the #parameters column of table 1. It seems you mean to indicate some sort of space complexity, but I am not too sure.
* You mention "elegant performance on large-scale KG". I like the lyric language use, but only accept it in case you support this with experiments, as mentioned above.
* Using TransE for initializing the vectors for the proposed embedding algorithms seems like a good idea indeed. It would be useful if you can report how much training time this saved overall.
* Your experiments cover the datasets which are often used for performing embeddings. However, for some experiments, some datasets are not used without a clear explanation. (e.g. why no WN18 in table 3)
* As already mentioned, the current graphs used are not that large at all. So, the statement "GTrans is experimentally demonstrated to be applicable to large-scale knowledge graphs." must be toned down as long as no large KGs (like e.g., Wikidata and DBPedia) have been embedded using this approach.
* In 4.4. Distribution of relation space. I see why this technique gives an indication of how even the distribution of the relation space is. What I do, however, not understand is why this would immediately mean that the embedding is better. Perhaps there are specific interrelations which are only representable by a non-even spreading. Moreover, comparing the error measurements of the different methods directly, as seems to be done in the last paragraph does not make sense. This mainly because the functions to be optimized are different and hence their errors cannot be compared.
* Language issues
There were several issues related to language in the paper. They were minor and not disturbing. Pay attention to the use of single and plural, in particular subject-verb agreements.
Besides, articles are missing in some places.
[1] Xiao, Han, Minlie Huang, and Xiaoyan Zhu. "TransG: A Generative Model for Knowledge Graph Embedding." In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 2316-2325. 2016.
|