Fact Checking in Knowledge Graphs by Logical Consistency

Tracking #: 2721-3935

Ji-Seong Kim
Key-Sun Choi

Responsible editor: 
Guest Editors KG Validation and Quality

Submission type: 
Full Paper
Misinformation spreads across media, community, and knowledge graphs in the Web by not only human agents but also information extraction systems that automatically extract factual statements from unstructured textual data to populate existing knowledge graphs. Traditional fact checking by experts is increasingly difficult to keep pace with the volume of newly created information in the Web. Therefore, it is important and necessary to enhance the computational ability to determine whether a given factual statement is truthful or not. In this paper, our goal is to 1) mine weighted logical rules from a knowledge graph, 2) to find positive and negative evidential paths in a knowledge graph for a given factual statement by the mined rules, and 3) to calculate a truth score for a given statement by an unsupervised ensemble of the found evidential paths. For example, we can determine the statement "The United States is the birth place of Barack Obama" as truthful since there is the positive evidential path (Barack Obama, birthPlace, Hawaii) ∧ (Hawaii, country, United States) in a knowledge graph, and it is logically consistent with the given statement. On the contrary, we can determine the factual statement "Canada is the nationality of Barack Obama" as untruthful since there is the negative evidential path (Barack Obama, birthPlace, Hawaii) ∧ (Hawaii, country, United States) ∧ (United States, ≠ , Canada) in a knowledge graph, and it is logically contradictory to the given statement. For evaluation, we constructed a novel evaluation dataset by labeling true or false labels on the factual statements extracted from Wikipedia texts by the state-of-the-art BERT-based relation extractor. Our evaluation results show that the proposed weighted logical rule-based approach outperforms the state-of-the-art unsupervised approaches significantly by up to 0.12 AUC-ROC, and even outperforms the supervised approach by up to 0.05 AUC-ROC not only in our dataset but also in the two publicly available datasets. The source code and evaluation dataset proposed in this paper is open-source and available at https://github.com/machinereading/KV-rule and https://github.com/machinereading/KV-eval-dataset each.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Houcemeddine Turki submitted on 18/Feb/2021
Major Revision
Review Comment:

This manuscript presents a novel rule-based approach for fact checking in knowledge graphs based on mining textual resources. The work provides new evidences that rule-based approaches can provide more precise evaluation of the accuracy of statements in knowledge graphs and can enhance the efficiency of embedding-based methods when combined with them. The availability of source codes and datasets in GitHub is an advantage for this work as this will allow the reproducibility of the described experimental study.

However, there are several matters within the paper that should be addressed to ameliorate its final output:

(i) Introduction: The "Introduction" seems to be a summary of "Related Studies in Fact Checking" rather than a proper introduction and contextualization of the paper. I propose to expand the part about misinformation fighting in the introduction to give better context for the development of this research paper. The authors can benefit from previous research papers about fact checking in general to develop the introduction of the paper. Several points in the introduction should be moved to Related Studies in Fact Checking.
(ii) The paper did not emphasize the advantages of rule-based approaches when compared to embedding-based methods beyond having a better precision. Effectively, there are many other advantages of rule-based approaches. For example, the results of rule-based approaches can be more explainable than the ones of embedding-based approaches. Such advantages should be expanded and highlighted in the research paper.
(iii) The paper did not emphasize the importance of having the datasets and source codes available in a specific GitHub repository. The authors should specify that this practice allows reproducibility and further development of the work by peers, particularly in the conclusion.
(iv) The paper did not discuss the concept of reification in knowledge graphs. Effectively, several knowledge graphs add qualifiers to triples to provide a characteristic of the statements (i.e. {(s,p,o), p, o}. The authors should discuss the usefulness of the method to verify the qualifiers of the statements in the Discussion or as a future direction for this work.
(v) The paper should evocate the robustness of the rule-based approach they proposed to adversarial attacks. This can be an advantage of the approach.
(vi) There are several typos in the research paper (e.g. "UC Berkely" should be "UC Berkeley"). The authors should proofreading the paper to eliminate such deficiencies.
(vii) The authors can expand the Discussion of the work (Part 5) to explain the strengths of KStream, KLinker, COPPAL, RUDIK, and PredPath that contributed to their efficiency as reported in the Experimental Study according to previous research papers. This can explicate the reasons of why the method developed by the authors was more efficient.
(viii) The authors should provide future directions for the development of this work in the conclusion.

Given this, I propose to accept this paper for publication after these six major revisions are applied.

Review #2
Anonymous submitted on 18/Mar/2021
Major Revision
Review Comment:

The paper presents a weighted logical positive and negative rules-based approach to check logical consistency of triples in a knowledge graph.

The paper has multiple flaws in terms of writing (please, consider English proofreading for a future submission), but also in terms of its structure and form (see remarks below).

Comparing rule-based and statistical approaches for graph completion is very useful. However, I was disappointed by table 1, which contains only three very obvious comparison criteria. I don’t find that very informative (and is totally redundant with the text in the corresponding paragraphe) and would strongly encourage a more in-depth analysis of the differences (pros and cons) of the two types of methods.

On a related note, I find the related work section difficult to follow. It probably can be improved by structuring better the different approaches, defining a clear basis for comparison between them. Also, and importantly, the section lacks a clear positioning of the proposed approach as compared to those reviewed in this section. I also fail to see the purpose of presenting embeddings-based approaches since they are not applied in this work, as far as i can see.

I fail to see the originality of the presented approach, my impression is that it builds largely on existing techniques (e.g. generation of negative samples, rule mining and the like).

The overall structure of the paper can be improved significantly. It currently contains multiple redundant parts (e.g. large parts of section 3 are repetitive wrt what has been said already in the introduction or elsewhere in the paper). While the overall approach is explained clearly, I think that relatively straightforward ideas are described in way too much details (like for example the negative examples sampling).

The results do not report anything about the computational complexity of the method, while an argument is made in the introduction about assisting human/manual fact-checking at scale. Also, the number of predicates in the datasets that are used in the studies appears very small for the approach to be able to account for a real-world scenario. More surprisingly, the evaluation results are reported only on a handful of predicates. Therefore I am doubtful about the applicability/generalizability of the proposed approach in a more realistic scenarios and at scale.

across media, community, and —> across media, communities, and
- Misinformation in the Web —> Misinformation on the Web
- in media and community makes --> in media and communities makes
- This problem is common and getting worse in modern digital society - this statement somehow needs support
- which is logically contradict —> which logically contradicts
- we did not contain those triples already contained in K-Box —> we did not include those triples already contained in K-Box
- there’s a screenshot issue with fig. 7

Review #3
Anonymous submitted on 08/Apr/2021
Major Revision
Review Comment:

- originality
This work proposes a new method to generate positive and negative rules from a knowledge graph. Positive rules can be learned from known facts which the KG already contains. The authors proposed a negative sampling strategy to generate negative rules that are used to assess whether a fact should be not part of the KG. It explores different assumptions namely local and extended local closed world assumption (LCWA and E-LCWA) to generate false facts and learns negative rules from generated false facts. The positive and negative rules are combined to assign a truthfulness score for a given fact. The authors extend RUDIK’s negative sampling approach to produce better false facts for non-functional properties (the properties that can have more than one value such as ‘relative’ relation) by making a distant local closed world assumption (D-LCWA) . 
- The authors propose a new rule weighting and truth scoring method, which is a revised version of RUDIK method, and compared its fact checking accuracy to other rule based fact checking algorithms on three different knowledge graph datasets: i) a synthetic dataset ii) a real-world dataset and iii) their constructed dataset.  The facts in their constructed dataset were extracted from the Wikipedia articles using a BERT-based relation extraction method and then the facts were labeled as true or false after manual checking for supporting sentences in Wikipedia articles.

- significance of the results
The authors achieved 5% of improvement over the-state-of-art fact checking methods on the benchmark datasets and performed an extensive experimental evaluation on three datasets, comparing with 5 different methods namely; (1) KStream, (2) KLinker, (3) COPPAL, (4) RUDIK, and (5) PredPath.
However, I still have some concerns regarding the results:
- The authors pointed out some issues related to existing benchmark datasets and constructed a new evaluation dataset. The authors claim their evaluation dataset is more challenging than existing datasets for fact checking. We don’t know if this dataset has any biases. Have they done quality assessments on their dataset? Do the annotators record the supporting statements/sentences when labeling facts? 
- The authors corrected existing datasets (i.e. synthetic and real world) by removing mislabeled true facts. Interestingly, they did not mention the removal of the overlapping true-labeled facts between the training dataset (in this case DBPedia) and the test dataset. Please explain this decision and why this issue was not addressed.

- When I checked their Github page, the documentation quality on how to use the library was not good. It is not clear how to run their rule generation algorithm on a sample KG dataset. The documentation should show a sample run of the algorithm, preferably on a small KG.

- quality of writing
In general, the paper is well written. The introduction was divided into multiple subsections, which disrupts the flow. In the introduction, a comparison of embedding-based to the rule-based approaches is given, and a performance table from their previous paper was included. However, this work focuses on solely rule-based fact checking and I think that the embedding part is not that relevant. I would suggest removing or simplifying these parts. 
 -> The result section includes many subjective statements (e.g., Line 25 in page 15 ). I would also suggest that the authors create a separate discussion section for these statements.  
-> The section of  3.5 (Rule Weighting by Logical Validity) in the methodology starts by explaining W2-measure without explaining W1. It would be useful to add a description for W1 used in the evaluation. What is an unbounded rule?  Please define it formally.