Explainable multi-hop dense question answering using knowledge bases and text

Tracking #: 3225-4439

Authors: 
Mohsen Kahani
Somayeh Asadifar
Saeedeh Shekarpour

Responsible editor: 
Cogan Shimizu

Submission type: 
Full Paper
Abstract: 
Much research has been conducted extracting a response from either text sources or a knowledge base (KB). The challenge be-comes more complicated when the goal is to answer a question with the help of both text sentences and KB entities. In these hybrid systems, we address the following challenges: i) excessive growth of search space, ii) extraction of the answer from both KB and text, iii) extracting the path to reach to the answer. A heterogeneous graph is utilized to tackle the first challenge guided by question decomposition. The second challenge is met with the usage of the idea behind an existing text-based method, and its customization for graph development. Based on this method for multi-hop questions, an approach is proposed for the extraction of answer explanation to address the third challenge. Evaluation reveals that the proposed method has the ability to extract answers in an acceptable time, while offering competitive accuracy and has created a trade-off between performance and accuracy in comparison with the base methods.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 11/Jan/2024
Suggestion:
Major Revision
Review Comment:

In this paper, the authors focus on the problem of answering natural language questions over a combination of knowledge base (KB) entities and text sources in an explainable way. The method proposed in this paper is based on three existing methods: DecompRC, PullNet, and MDR. DecompRC is for decomposing multi-hop questions into sub-questions that can be easier handled in each step. PullNet is employed to expand graphs that contain KB entities, triples, and entity-linked documents that are relevant to given questions and are potentially components of final answers and answer explanations. MDR is for finding sequences of texts as answers from relevant documents. Accordingly, the proposed method includes four modules: sub-question generation, graph expansion, sequence retrieval, and answer extraction. Experiments were conducted on MetaQA, WebQuestionsSP, ComplexWebQuestions, and HotpotQA. It is demonstrated that the proposed method outperforms PullNet and MDR on respective QA scenarios: KB-based question answering and text-based question answering.

Strengths

1) This paper focuses on the hybrid question answering problem over both KB and text sources, which is an important problem that has not been well-researched.

2) The experimental results demonstrate that the proposed method outperforms the original methods, i.e., PullNet and MDR, on respective question answering scenarios. The authors also tried to give explanations in analyses.

3) The authors present extensive discussion on the findings, theoretical and practical implications that they observe and conclude in experiments.

4) The authors provide the link to the source code, where detailed README can be found in respective folders of the code. The code appear to be complete for replication and the GitHub repository is appropriate for long-term discoverability.

Weaknesses

1) The readability of the methodology section is very limited. There is a lack of rigorous and consistent definition of notations used in the introduction. Also, the equations are mostly listed without sufficient explanations. Therefore, I could only get a notion of what the method is trying to do in each module but cannot be sure about the technical details.

2) Given the lack of clarity of proposed method, it is difficult to assess the novelty and soundness of the proposed method. It is especially difficult to examine the difference between the graph expansion and sequence retrieval modules of this method and the existing methods that are employed, i.e., PullNet and MDR.

3) In experiments, the method is only compared with PullNet and MDR. The proposed method does not really need to outperform all existing methods. But it is necessary to compare with a few other existing baselines so that the competitiveness of this method can be positioned regarding the current progress of research on this problem.

4) The authors only considered Hits@1 in several experiments. The recall and precision of returned answers need to also be examined, considering the existence of questions with multiple answers.

5) The authors claim that the method can provide explanations. However, this aspect is not evaluated in experiments. It would be better to provide a quality evaluation of generated explanations or, at least, a case study demonstrating the explainability of the method.

6) Several notations and acronyms are used in figures and texts before they are formally defined or introduced, which hampers the readability of the paper.

7) It is reported that “the main differences between GraphMDR and PullNet systems are given in Appendix B.” I believe this is very important and should be concisely presented in the main text.

8) There are several places where the font of texts or the appearance of notations are inconsistent.

In general, the quality of writing needs to be further improved. It is difficult to assess the novelty and soundness of the method given the current writing. Also, the existing experimental results are not sufficient to demonstrate the explainability of the method and to position the competitiveness of the method among existing works. Therefore, my suggestion would be Major Revision.

Review #2
Anonymous submitted on 31/Jan/2024
Suggestion:
Minor Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

Summary - The paper introduces a novel approach aimed at addressing challenges in open-domain Question Answering (QA) systems, particularly for multi-hop questions. The research compares GraphMDR with two existing systems, PullNet (a hybrid QA) and MDR (a text-based QA), focusing on factors such as search space reduction, answer explainability, and the ability to extract responses from both textual sources and knowledge bases (KB). The proposed method shows promising results in terms of response extraction speed, maintaining competitive or improved accuracy compared to the baseline systems. The conclusion suggests future work to make the model training independent of the KB, explore the prioritization of sentence extraction, and examine constraints, especially those requiring calculation, in comparison to text or KB priorities.

Contributions - This paper proposes a novel graph-based approach for open-domain question answering (QA) that utilizes both textual and knowledge base (KB) sources. This contribution has merits in several areas:
1) The paper introduces a unique method for leveraging relationships between entities in the KB and text through a graph representation. This approach hasn't been widely explored in existing hybrid QA systems.
2) The proposed method demonstrates competitive or improved accuracy compared to baseline systems, while also achieving faster response extraction by reducing search space. This holds potential for enhancing efficiency and effectiveness in real-world QA applications.

Strengths -
1) Originality: The graph-based approach to representing relationships between entities in the KB and text is unique and hasn't been widely explored in existing hybrid QA systems. This innovative method sets the paper apart and warrants further investigation.
2) Methodological Rigor: The paper clearly outlines the proposed method, including its architecture, training procedure, and evaluation metrics. This transparency allows for easy understanding and reproducibility of the research.
The authors compare their method to established baselines, demonstrating its competitive or improved performance in both accuracy and speed.
3) Positive Results: The paper reports encouraging results, showing that the proposed method can achieve accuracy at par with or exceeding existing systems while simultaneously reducing response extraction time.
4) Clear Communication: The paper is well-written and easy to follow, effectively communicating complex concepts with clarity and precision. This ensures that the research is accessible to a broad audience within the field and maximizes its impact.
5) Valuable Future Directions: The authors propose several insightful avenues for future exploration, including training the model independently of the KB and exploring the prioritization and weighting of extracted information. This demonstrates a thoughtful approach to further development and opens doors for exciting future research.
6) Long term URL seems to be working, with proper Readme file.

Weakness -
1) The paper focuses primarily on factual, entity-centric questions. Exploring performance on broader question types like open-ended or reasoning-intensive questions would provide a more comprehensive understanding of the method's generalizability.
2) The paper could benefit from a more detailed discussion of the limitations and challenges faced by the proposed graph-based method.
3)The paper could acknowledge and discuss potential biases in the training data or methodology, along with any steps taken to mitigate them. This transparency is crucial for responsible AI development.

Review #3
Anonymous submitted on 05/Apr/2024
Suggestion:
Reject
Review Comment:

Originality: The paper lacks any novel contribution. The module just merges the two existing architectures and answers the question.

Significance of the results: The authors have shown the comparison of the proposed model (GraphMDR) with individual building blocks, PullNet, and MDR rather than proper baselines. These experiments are merely ablation studies. The authors mentioned different baseline models in section 3.3 but never used them in experiments.

Quality of writing: Writing quality is extremely poor. The grammar in a lot of sentences needs to be corrected. Some of the issues in writing are stated below -

- Captions of tables and figures need a proper description.
- The structure of the paper is not free-flowing and hard to follow. There are various instances where terms and variables have been used earlier and explained later.

- Introduction Section:
- The definition of IR in the second paragraph - "The goal of an IR system is to find documents containing the answer to the query." What if the document size is in TBs? Would the goal of IR still be to find the documents?
- The last line in the second paragraph - "Therefore, QA is closely related to other fields, such as natural language processing (NLP) and machine learning (ML)." IR is not related to ML and NLP?
- Third paragraph - Add citation.
- Sixth paragraph - Which problems citation [13] solve? Why are people shifting to multi-step inference? What were the issues with previous versions that the multi-step inference solves?
- Eighth paragraph - I can see only KB-based questions in Q1 and Q2 (Table 1). There are no Text-based questions.
- The third contribution is just the first and second contribution combined.
- Last paragraph - No mention of section 7.

- Related Work - first sentence in Semantic parsing methods - Add citation.
- Section 4.1 Different variables in paragraph, and different variables in Figure 2. Multiple variables are present in the figure, but there is no description in Section 4.1, first mentioned in Section 5.1.1.
- Figure 2 needs reconstruction; multiple details are missing, and the knowledge graph on the right is not visible.
- Duplicate figures are constructed in place of 1 (For example, Figures 2 and 5).
- Multiple variables in equations 1 and 2 are not described in the following paragraph (sq_t).
Duplicate figures are constructed in place of 1 (For example, Figures 2 and 5).
- The same equation is repeated multiple times (Equations 7 and 9).
Section 5, last paragraph (before 5.1) - Authors have treated documents, passages, and sentences as a single label, which is wrong.
- What is v_f in Figure 4? No mention in the paper.
- What is the existing new entity in 5.1.1.1? Is it an existing entity or a new entity?
- The sequence of sections - 5.1.2 - 5.1.3 - 5.1.2?

Places that need re-writing -
- Line 5 abstract.
- Introduction section, 10th paragraph, first line - "Another important challenge of existing hybrid systems is the ability to explain, which means explaining how to arrive at the final answer."
- Section 3.3 is hard to follow.
- Section 4, first paragraph.
- Section 4.1.
- Section 5.1 - paragraphs 2 and 3, with formulas, are difficult to understand. Also, probability P is mentioned differently in the two instances.
- Figures 2, 3, and 4.
- Section 5.1.1. 4th paragraph.
- Section 5.1.1.2.
- Section 5.1.2. 1st paragraph, last line.

Readme: The readme of the complete code needs to be added; how do you access the complete code, and what is the flow of it? We have a readme of individual modules based on the original papers and no unified version to create an environment to run code.