Review Comment:
This paper presents a hashing algorithm for topic modeling techniques designed to improve their efficiency. In addition, autors claim their approach improves the explainability of topic modeling results by grouping topics in a hierarchical manner.
The paper is well written and easy to follow. The research topic of the paper is not new, but the proposed extensions demonstrate the authors have done their homework with the state of the art and propose a valuable contribution. I think the paper is a good fit for this special issue, but I am a little divided regarding the relevance of the approach. The special issue focuses on Semantic e-Science, and while the authors' work clearly deals with the detection of similiar scientific work in an explainable manner, the "semantic" aspect of the contribution is unclear to me. Are semantics or knowledge representation used in the approach? If so, how? The authors sometimes refer to the "data type" of the papers, but that is not further elaborated. I suggest the authors clarify this in the next revision of the manuscript. Below I describe other comments and suggestions that I think should be addressed as well:
- The experiments define distance metrics and compare their performance among them. However, this is not compared against the state of the art. Why? I think that even if the hierarchy level is the same, a comparison is needed to understand how the current approach performs. In addition, why is precision is selected as a metric and not the F-measure? Is recall not considered important in this case?
- How does the current approach improve efficiency? Tables 4-7 show the ratio of data consumed, but there is no indication on how this affects the overall efficiency of the topic based models. How is this translated to time improvement? Is this improvement worth the loss in precision? The first technique does not seem to yield good results. Is it there just as a baseline?
- Human validation is not present. This seems critical for similarity based techniques, specifically if there is not a large ground truth. Are human-based evaluations going to be part of the future work?
- I am a little confused by the claim of the approach being appropriate for topic detection on unseen texts. An illustrative example would be helpful.
- Figures 1-4 show small variation when changing the number of topics. In particular, Fig 4 seems to be very consistent. Are the variances in the graphs significant? Is it better to have a lower number of topics?
- The data type of the papers is claimed to be important for the topic algorithms. However, it's not mentioned in Section 3. Why is this?
Presentation comments/small issues below:
- The problem statement is not clear until after the second page of the paper. I believe it should be made clear to readers before that point.
- The authors describe the source code as a contribution of their work. The source code supports an implementation of your approach, and that should be the contribution (instead of the code), right? In addition, the development of corpora to validate and test your approach is a separate valuable contribution in my opinion.
- Why not addressing the triangle inequality problem is an issue?
- Could different distance techniques yield different results in Section 4.2?
|