Review Comment:
In the last review (regarding submission #2675-3889), I raised the following questions/weaknesses and expected the authors to address them. Please find the updates and my follow-up questions below:
1) several existing methods, e.g., Babelfy and NASARI, are directly utilized in the model. However, there is a lack of sufficient introduction to the used methods, which hampers the readability of the paper.
Update: The authors added some introduction to the models used in Section 4.1. However, the readability of this section has not been largely improved because the introduction does not fit the context of this paper very well. For example, in the introduction to Babelfy, it is very difficult to understand what “lexicalized semantic network” and “semantic signature” mean because these terms are directly taken from the Babelfy paper without any link to the context of this paper. In addition, some acronyms are used without giving their full names, e.g., EL and WSD.
2) the introduction to the attention layer only takes two short paragraphs without any formal definitions or equations which makes this module totally unclear.
Update: The authors added equations correspondingly in Section 4.2. However, if I understand the paper correctly, Equ. 4 and 6 are wrong unless the authors indeed only selected ONE word (indexed by $i$) from each question/answer sentence to represent the question/answer.
3) the given "implementation details" are not detailed at all. Only the layer numbers of the autoencoder and the dimensions of latent representations are given. What are the parameter settings of the initial representations, the attention layer, the MLP classifier, and the convolutional filters?
These two issues make it difficult to reproduce the system as well as the experimental setup. Please provide both the formal definitions as well as the parameter settings used for each component to enable reproducibility.
Update: The “implementation details” part has been extended, and, in the response letter, the authors promise that the source code link will be provided in the camera-ready version.
4) In the ablation study, when any one of the key components (e.g., the knowledge graph-based disambiguation, or the attention layer) was removed, the model still outperformed most of the baseline models. However, there was only one component removed every time in the ablation study. The baseline of the proposed model (i.e., the vanilla version without any component) is not evaluated. Is it possible to start with the evaluation of the baseline, incrementally add one component at a time, and analyze how the performance could be increased with more components added? Also, it would be appreciated to have more information about the comparison set-up as well as the implementation details of the proposed model and the other compared models.
Update: The authors added an evaluation of the baseline model and conducted experiments on how the model’s performance improves with the modules proposed in this paper being added incrementally.
5) the originality of the paper is limited because: first, the main modules of the proposed model, e.g., knowledge graph-based disambiguation, and the Siamese autoencoder, have been already widely utilized in the literature; second, given the lack of details of the model design and implementation (e.g., how the existing methods are integrated into the model, and how the attention layer is customized in this model), the originality of each proposed module is difficult to be assessed.
Update: Regarding the originality concern, I accept the explanations given in the response letter. And the authors have added more details on model design and implementation in the manuscript.
6) The quality of writing does not meet the requirement of this journal due to the aforementioned lack of readability and minor errors such as:
In Section I - Paragraph 1, several applications, e.g., recommender systems, are listed without adequate references.
In Section I - Paragraph 3, there is no reference for SemEval2015.
In Section I - Paragraph 7, "unable to encode" instead of "unable to encoding", and it should be a period instead of a semicolon at the end of the paragraph.
In Section I - the last contribution, there should be references for the three listed datasets.
In Section II, please use adequate mathematical expressions instead of English characters when denoting variables and parameters.
In Section II - Paragraph 2, "as follows" instead of "as follow".
In Section III, it is claimed that "none of existing methods have considered the context in question-answer representation". However, after reading the related works introduced in the paper itself, I am skeptical of this claim.
Please thoroughly check the writing of the paper.
Update: the authors have addressed some minor issues and improved the quality of writing. However, my concern regarding claiming “none of existing methods have considered the context in question-answer representation” has not been responded to. In Table 2, the description for the references [10] and [43] is “used CNNs for similarity matching and the label of previous and next answer for CONTEXT modeling through LSTM”. Why do you think that they did not consider context?
In conclusion, it is appreciated that the authors have addressed most of my concerns and updated the manuscript accordingly. However, as given above, there are still some follow-up questions, drawbacks, and even errors regarding the current version. Therefore, my suggested decision is a minor revision.
|