Review Comment:
The paper presents novel methods for evaluating knowledge graph embedding models with respect to their ability to predict semantically meaningful triples, i.e., triples which satisfy domain and range constraints that are either specified by the KG schema or obtained relying on the data in the KG itself. The authors propose several variations of the respective sem@K metrics extending their earlier work [*], and perform extensive evaluation of popular knowledge graph embeddings with respect to their "semantic-awareness" relying on the proposed metrics on a number of standard datasets with slight adaptations.
[*] N. Hubert, P. Monnin, A. Brun and D. Monticolo, Knowledge Graph Embeddings for Link Prediction: Beware of Semantics!, in: DL4KG@ISWC 2022: Workshop on Deep Learning for Knowledge Graphs, held as part of ISWC 2022: the 21st International Semantic Web Conference, Virtual, China, 2022.
The need for extending evaluation protocols of embedding models with metrics that estimate how well KG embeddings capture semantic information in the KG has been acknowledged in a number of works cited by the authors. The considered problem is definitely timely, relevant, and perfectly fits the scope of the Semantic Web journal. The introduced sem@K metrics are very natural suggestions for the considered task capturing a simple idea to measure how well KG embeddings preserve the domain and range restrictions. The main contribution of the work, in my opinion, is extensive systematic empirical evaluation of families of KG embedding models with respect to the introduced metrics. Generally, the technical part of paper is well-written and the examples throughout the work help the readers to grasp the introduced concepts.
There are several questions/suggestions for improvement from my side:
- The authors discuss related works that also measure the semantic-awareness of KG embedding models, but do not directly compare the respective metrics to the introduced ones. It seems that inc@K metric from [5] reflects the same intuition as sem@k[base] when the ontology only contains domain and range restrictions as well as class disjointness axioms?
- As the main contribution of the paper seems to be the extensive evaluation of the models with respect to their semantic-awareness, the evaluation section might need to be improved a bit to help the readers grasp the main message of the paper. While the authors summarize some of the observations in the text, it is often difficult to extract the messages from numerous tables. Probably bar charts instead of (or additionally) to tables presenting rank-based and semantic-based metrics could be helpful. In order not to make the plots too overloaded and keep the results digestible, it might be sufficient to only report hits@k (resp. sem@k) for a single k.
- The provided GitHub link contains the datasets used in the experiments; these seem to be complete. It would be helpful to also share the implementation of the introduced evaluation protocols along with the README file in order to ensure the reproducibility of the results.
- While the schema of the KG and the type hierarchy are definitely the most immediate choices for semantic artifacts that can be considered in the evaluation of the semantic-awareness of KG embeddings, in the general case KGs can be accompanied with more expressive ontologies. It might be worthwhile including a discussion on the possible extension of the proposed metrics to also account for such ontologies. For example, an ontology axiom might state that "presidents live in capitals", in which case given that Joe is known to be a president in the KG, the prediction "Joe lives in Chicago", would not reflect the respective axiom, while still being semantically correct with respect to the domain and range restrictions of the "livesIn" relation. Another aspect is concerned with evaluating whether KG embedding models are predicting combinations of facts that are not contradicting each other. Each fact might be perfectly valid based on the ontology when considered on its own, but the combination of predictions could violate the schema/ontology. Extending the above example, the model might make two predictions "Joe lives in Chicago" and "Joe has profession president". Each prediction considered in isolation is meaningful and semantically correct, but jointly they do not follow the above axiom.
- In the current version of the paper the authors only restrict themselves to the entities, for which types are specified in the KGs. It is generally a bit of a limiting factor (which authors also admit). In principle, the KG embedding models are also capable of predicting types themselves. Thus, generalizing the proposed metrics to account for combinations of predictions seems to be a rather intuitive and natural extension.
- While it might be too demanding to ask for the inclusion of the extensions of the proposed metrics suggested above in the main part of the paper and experiments, I think having a broader view on the concept of semantic awareness touching upon the respective directions of considering mutual predictions made by the model and including more expressive ontologies could be helpful. This can be done in a separate Discussion section, for example.
Additionally, further careful proof reading should be done, as there are quite some typos/grammatical inaccuracies left in the paper:
- Abstract: "Its joint analysis with rank-based metrics offer" -> "...offers"
- p. 2 Fig. 2 is referred as a motivating example, but in the text it appears much later (p. 8), this is rather unusual, I think it would be more intuitive to have the motivating example in the beginning of the paper already, where it is referenced for the first time.
- p. 6: "...values increases..." -> "...values increase..."
- p. 7: "... it is assumed the test set only comprises..." -> "... it is assumed that the test set only comprises..."
- p. 7: "...Model B semantic awareness" -> "...semantic awareness of the Model B..."
- p. 7: "As aforementioned..." -> "As mentioned above..."
- p. 10: "...is the number of edges linking c to c'..." -> "...is the length of the path from c to c'"?
- p. 11: "Accordingly to Section 4.2.1..." -> either "According to Section 4.2.1" or "As discussed in Section 4.2.1..."
- p. 13: "...the semantic awareness of the most popular KGEMs are analyzed." -> "... is analyzed"
- p. 13: "...with d a distance function..." -> "...where d is a distance function..."
- p. 16: "...and are provided..." -> "...are provided..."
- p. 17 "...are better able at recovering..." -> "...are better capable of recovering..."
- p. 18 on Fig. 5 (c) ComplEx seems to be missing? Is there a particular reason for that?
- p. 18 "Where translational and semantic matching models treat..." -> "While translational and semantic matching models treat..."
- p. 18 "...a trade-off exist" -> "...exists"
- p. 19 "...the most of KGEMs reaches..." -> "...the most of KGEMs reach..."
- p. 20 "...with a hierarchy class..." -> "...with a class hierarchy..."
- p. 20 "... are better able at recovering" -> "...are better capable of recovering"
- p. 21 "... the performance of KGEMs in terms of rank-based metrics are not..." -> "... is not"
- p. 21 "...study for a future work..." -> "...study for future work..."
|