Review Comment:
The paper proposes a question-answering system for multi-hop questions by leveraging both free text and a knowledge base as sources of information.
===== Originality =====
I am not an expert of the field and from my comprehension of the paper, it is difficult to assess the originality of this work. The work proposal is to extend the sota works PullNet and MDR but it is not clear whether the implemented extension regards just technicalities or strong scientific principles that can be used by other researchers. These differences are explained in Section 4.1.2, but for non-expert readers it is difficult to discern the technicalities from the scientific principles. Neither the RQs are clear. It seems to me that question decomposition (RQ1) has already been studied, as well as the simultaneous use of both text and knowledge bases (RQ4), see the hybrid methods in the related work. The same holds for RQ3: the extraction of the intermediate answers has already been studied by multi-hop systems. Moreover, with this formulation, RQ3 is not a research question. In addition, Section 4 states that “The present study provides a solution by integrating the three considered axes to balance the accuracy and the efficiency, while extracting the answer from both sources, considering the constraints stated in the question.” However, it seems to me that this is already done by PullNet and MDR.
The authors should better (and explicitly) state their scientific contributions with respect to the other works and especially with MDR and PullNet.
The considerations in the Discussion Section seem to me just findings already done by the sota:
- The findings in Section 6.1 are not so informative. The points ii) and iv) are just trivial considerations coming from the sota. Point iii) is just false, probabilistic methods enable the extraction of explanations as non-probabilistic methods do. It depends on how you develop the explanation system. This point is too vague.
- Section 6.2 does not add much. It seems a remark of the importance of explanation extraction, sub-question decomposition (that seems to me already done by other works) and finding the best sequence of intermediate answers (maybe is this the main contribution of the paper?).
- Some point of Section 6.3, the 1st and the 4th are not novel findings.
===== Significance of the results =====
The Experiments and Results Section is well written and easy to follow, and the numeric results seem to confirm the validity of the method over the main competitors. However, there are some passages that deserve a better discussion:
- Table 2: Why did you test COMPLEXWEBQUESTIONS on the dev set and not on the test set?
- Table 2: The PullNet results regard the KB + Text setting, however, PullNet in the only text setting has much better results. I understand that the authors want to use the KB to preserve the semantics of the answers, but they must show the better PullNet results in the text setting. This can be followed with a discussion on the importance of the semantics.
- Tables 2, 4, 5: are these results extracted from the test set?
- Fig 2 should include the results also for PullNet for a fair comparison.
- Section 5.2: “Fifty percent of the KB is utilized …”, why not the whole KB?
- Section 5.2: “GraphMDR is more accurate than PullNet”, this is true only if you use 3 hops in the MetaQA dataset. The authors should explicit this.
===== Quality of writing =====
The writing is really lacking clarity and the structure of the paper can be improved for a better comprehension for non-expert readers. In general, the paper is verbose with many technical details, it seems more a technical report than a scientific paper with strong principles.
- The word “Explainable” in the title is misleading as no evaluation has been carried out about the explainability. See [1] for an example of evaluation of explainability with real users. The system is able to give explanations but we do not know whetere these explanations are good or not.
- The introduction starts abruptly with no real introduction about the topic, it seems more a list of related works. On the other hand, the text from “In recent years … ” to “and unstructured sources.” in the related work is perfect for the introduction.
- After the introduction, a section with the background that explains with some formalism the topic and the two main works of the sota (MDR and PullNet) would help non-expert readers. This is partially done at the beginning of Section 4, from “As mentioned …” to point iii). I suggest moving this part in a Background Section.
- The Related Work Section is verbose, explaining the details of what every single work does and how becomes useless for the reader. It would be better to define some important features of QA works and state how much the current works fit or not such features.
- Section 4 is the core of the paper and unfortunately is the least comprehensible part. Figure 1 is not the architecture of the system but just a flow diagram merged with an example. I suggest drawing a real architecture showing all the used computational blocks (even if they are from other works) and their input/output in order to include every single symbol in the equations. Section 4 is verbose, some pseudocode would make it clearer. The pseudocode should be a formal representation of the new figure. It is not clear what the three types of inference are, you cannot refer to another paper as the presented work should be self-contained.
- Section 4.1.4, you cannot refer to another paper for the use of the constraints. The presented paper should be self-contained.
- Section 5.2, Why is only 50% of KB used? Is this a common practice? Please specify why.
===== Other concerns =====
- Many grammatical errors, the paper needs a full linguistic revision.
- Section 3.3: in two main model -> in two main models
- Section 4: searched in sequence -> searched in a sequence
- Section 4 just before 4.1: triples, facts, documents, concepts: all plurals.
- Section 5.1: “there are two main categories”: of what??
- Section 7: goals are search -> Goals are search
- Section 7: using and answer extraction answers from -> using answer extraction from
[1] Donadello, I., Dragoni, M., & Eccher, C. (2019). Persuasive explanation of reasoning inferences on dietary data. In SEMEX: 1st Workshop on Semantic Explainability (Vol. 2465, pp. 46-61). CEUR-WS. org.
|