Triple Confidence-aware Encoder-Decoder Model for Commonsense Knowledge Graph Completion

Tracking #: 2929-4143

Authors: 
Fu Zhang
Hongzhi Chen
Xiang Li
Jingwei Cheng

Responsible editor: 
Guest Editors Commonsense 2021

Submission type: 
Full Paper
Abstract: 
Commonsense knowledge graphs have recently gained attention since they contain lots of commonsense triples, like (get onto web, HasPrerequisite, turn computer on), which usually use free-form text to represent the entities and are essential for many artificial intelligence applications. However, a large amount of valuable commonsense knowledge still exists implicitly or misses. In this case, commonsense knowledge graph completion (CKGC) is proposed to solve this incomplete problems by inferring the missing parts of the commonsense triples, e.g., (?, HasPrerequisite, turn computer on) or (get onto web, HasPrerequisite, ?). Some existing methods attempt to learn as much entity semantic information as possible by exploiting the structural and semantic context of entities for improving the performance of CKGC. However, we found that the existing models only pay attention to the entity and relation of the commonsense triple and ignore the important confidence (weight) information related to the commonsense triple. In this paper we innovatively introduce commonsense triple confidence into CKGC and propose a confidence-aware encoder-decoder CKGC model. In the encoding stage, we propose a method to incorporate the commonsense triple confidence into RGCN (relational graph convolutional network), so that the encoder can learn more accurate entity semantic representation by considering the triple confidence constraints. Moreover, as well known the commonsense knowledge graphs are usually sparse, because there are a large number of entities with an in-degree of 1 in the commonsense triples. Therefore, we propose to add a new relation (called similar edge) between two similar entities for compensating the sparsity of commonsense KGs. In the decoding stage, considering that the entities in the commonsense triples are sentence-level entities, we propose a joint decoding model by combining the InteractE and ConvTransE. Experiments show that our new model achieves better performance compared to the previous competitive models. In particular, the incorporating of the confidence scores of triples actually brings significant improvements to CKGC.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 10/Dec/2021
Suggestion:
Major Revision
Review Comment:

This paper presents an approach for commonsense knowledge graph
completion using an encoder/decoder model. The authors use an encoding
model based on BERT, and a decoder model with a convolutional network.
The authors show that their method finds more supplementary commonsense
triples than the previous state of the art. They show this with the
Hits@n(n=1, 3, 10) ranking: where the correct triples are ranked in the
top n among all combinations of triples. However, the authors do not
outline how their commonsense graph completion is evaluated on ATOMIC
(which also has triple stores). I was also interested to hear how the
author's model compares to the types of knowledge that is inferred from
Malaviya et al.'s model.

The authors adequately state the parts of their approach, but some
details could be strengthened. For example, many of the figures, like
Figure 2 could use a more descriptive caption. I think much of the text
in Section 4.1 describes the figure, and it may want to be moved (or
repeated) in the caption.

One of the contributions of the papers is an appropriate importance
weight. However, the weight calculation is "the weight of the triples
is set to the size of the threshold," which seems arbitrary (not to
exceed 4 in the paper). If the weights are an important contribution of
the work, then I think they should be evaluated. Further, the authors
use the weights in conceptNet, but I believe these weights are not
normalized, which may prove to induce biases in their model.

The paper could also be strengthened by explaining the types of
commonsense triples their model is able to learn. I found Table 7 in
the Appendix to be useful to understand the types of triples that their
model is able to predict. I'm wondering if there is a subset of these
examples that could be (1) added to the main part of the paper and (2)
explained to show the novelty of the types of information that their
model can learn.

Based on the points above (and the suggestion below). I suggest major
revisions before publication.

1 Minor suggestions
===================

- The first sentence of the abstract: "Commonsense knowledge graphs
have recently gained attention since they contain lots of
commonsense triples" could perhaps be strengthened to say how
commonsense knowledge graphs in a common tripe store are essential
for AI applications.
- I found Table 4: the summary of results to difficult to interpret.
Firstly, the quantification of success, Hits, are described after
the table. Similarly, Table 5 is very difficult to read.
- "still exists implicitly or misses" -> "exists implicitly or is
missing."
- "to solve this incomplete problems" -> "to solve these incomplete
problems" or -> "to solve this incomplete problem"
- Line 29: "Lean more accurate entity semantic representation" ->
"learn more accurate semantic representations" or -> "learn a more
accurate semantic representation".
- Line 39, Page 3: "In 2020, Malaviya et al. [7] propose a model" ->
"In 2020, Malaviya et al. [7] proposed a model"
- Line 31, Page 5 "CoceptNet [23]" -> "ConceptNet [23]"
- Page 7, line 12, missing period: "attention The" -> "attention. The"

2 Larger suggestions
====================

1. Page 2, line 50: "The values of triples in commonsense KGs...have
never bee utilized for CKGC in these previous work. I think there
may be a couple other previous works that the authors may want to
examine:
1. Omeliyanenko, Janna, et al. "Lm4kg: Improving common sense
knowledge graphs with language models." International Semantic
Web Conference. Springer, Cham, 2020.
2. Davison, Joe, Joshua Feldman, and Alexander
M. Rush. "Commonsense knowledge mining from pretrained models."
Proceedings of the 2019 Conference on Empirical Methods in
Natural Language Processing and the 9th International Joint
Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.
2. In section 2.2, the authors state that Malaviya et al. [7] are the
first to propose a "specific" model for the complete of the
commonsense knowledge graph. What do they mean the "first time a
specific model" is being used for completion? Here is another
(very recent) paper on KG completion: B. Wang, G. Wang, J. Huang,
J. You, J. Leskovec and C. . -C. J. Kuo, "Inductive Learning on
Commonsense Knowledge Graph Completion," 2021 International Joint
Conference on Neural Networks (IJCNN), 2021, pp. 1-8, doi:
10.1109/IJCNN52387.2021.9534355.
3. In section 2.2, the authors mention that InductivE is the first
benchmark for inductive commonsense KG completion. Before
InductivE, how was commonsense completion evaluated?
4. Page 4, line 42: "Commonsense knowledge is a fact accepted by most
people.." I'm not sure what this means.
5. There is a "similarTo" or "synonym" relation in ConceptNet. How
does this differ between the edges used for the similarity score.
6. In Section 5.1, the authors explain how they use BERT to find
"initial entity embeddings" and they add "some similar edges." How
many? In terms of the percentage of edges?
7. In section 5.1.1, what do the authors mean by ELMO "cannot perform
deep modeling work"
8. What do you mean by BERT is "a kind of representation learning"?
This point wasn't made clear.
9. The element wise activation function, $\sigma$, is that defined in
previous work, or was that defined in this paper?

Review #2
Anonymous submitted on 30/Dec/2021
Suggestion:
Major Revision
Review Comment:

*** Overview ***
The paper presents an encoder-decoder model for commonsense knowledge graph completion.
In the encoding stage, the authors propose techniques to (1) find semantically similar entities to alleviate the sparsity of the input KG, and (2) incorporate the confidence scores of commonsense triples into an existing relational graph convolutional network.
In the decoding stage, the authors propose a joint decoding model based on two existing architectures, ConvTransE and InteractE, to predict the missing entity of a given pair of a head/tail entity and a relation.
In the experiments, the authors showed that the proposed model outperformed baseline models on the test set of the ConceptNet-100K dataset.

*** Writing ***
There are several language errors throughout the paper, ranging from typos, grammatical errors, and spelling errors. In addition, some sentences/phrases appear very obscure to me.

For example, language errors that can be found in page 2 are:
- Page 2, line 19: "select part of" -> "selected part of"
- Page 2, line 26: "directly in text" -> "directly in texts"
- Page 2, line 27: "the missing commonsense knowledge of facts": this is obscure, and could be changed to, for e.g., "the missing commonsense knowledge"
- Page 2, line 28: "this incomplete problems" -> "this incompletion problem"
- Page 2, line 49: "work in [7, 9]" -> "works in [7, 9]"

Another common mistake is that: the determiner "the" is misused several times (it should be used only when its following noun is already known/mentioned).

*** Methodology ***
Although the proposed model is interesting, the authors only experimented on a single dataset called ConceptNet-100K. That raises the following questions and concerns:
1. How well could the model generalize to other commonsense datasets (e.g., ATOMIC (Hwang et al., 2021), Quasimodo (Romero et al., 2019) or ASCENT (Nguyen et al., 2021))?
2. ConceptNet is limited in coverage as it was built based on crowdsourcing. A small subset of ConceptNet is not a good ground truth for evaluating the models because it certainly misses many correct relations. Crowdsourcing evaluation might be used here to measure the precision of the model's predictions (e.g., precision@1,2,5,10).
3. The confidence scores of the dataset do not seem intuitive to me. For example, both triples in page 4 - line 44 and line 45 make equally good sense, but one is annotated with a confidence score of 6.32 and the other with a confidence score of 1.0. How can the model make sense of these scores?

*** Supplementary material ***
The authors did not provide any code or resources.

*** Conclusion ***
The proposed method seems sound and interesting to me. However, the paper needs substantial improvements on its writing. The experiments should also be extended substantially to give more insights on the model's performance (e.g., precision/recall of the predictions; does the new triples add any values in extrinsic use cases?).
Therefore, my suggested decision is "Major Revision" with an overall impression of 60.