Review Comment:
The paper presents a tool-supported method for creating overviews of how online discussions evolve thematically over time. Preprocessed posts from each time interval are represented as low-dimensional vectors and clustered into interval clusters. Interval clusters are then merged within and across consecutive time intervals to form topic flows, which are visualised as alluvial diagrams. The tool has been used to analyse one year of 22k Reddit postings about the Fedora operating system. The created overview has been evaluated using internal measures of cluster quality, structure, and topic coherence.
Social media discussions already play important roles in almost all aspects of public life. Developing better methods and tools to understand their central topics and how they evolve over time is therefore a highly important research area. The paper is mostly well-written and structured and easy to follow. The technical implementation and evaluation appear carefully and thoroughly carried out. However, the manuscript in its present form also has several weaknesses.
- Fit for the Semantic Web Journal: Although the tool uses general graph representations and NL analysis of social-message semantics, it does not use or attempt to contribute to techniques of practices that are central to the semantic web, linked data, ontologies, or knowledge graphs. The semantic web is mentioned several times in the introduction, but never again after the first sentence of the background section. Instead, the tool's goal is to better understand social-media contents, so it would more appropriately be directed to that research community.
- Lack of comparison with related work. Another weakness it the lack of comparison with other approaches to topic clustering. Although the background section mentions a few approaches with similar aims, they are not presented in any detail and there is no attempt to empirically compare the paper's proposal with existing methods. Section 5.4 "Results discussion" does not contain a single reference to other work.
- Lack of evaluation with human users. This is another critical limitation of the present version of the paper. The evaluation uses only internal measures. It is unclear how well these measures reflect the needs of human users. For example, the paper does not argue convincingly for the relevance of the "length and events quantity" measures used to evaluation topic-flows. Empirical assessment of the results topic flows and their visualisations by human subjects is therefore called for before the paper can be accepted in a top-level journal.
- Presentation of the algorithms. The "Discussion Topic Flows" algorithm is described only briefly and informally in the main text. The readers are referred to a table (Alg. 2) with pseudocode that seems to leave some detail out. For example: why is nodesStack treated as a stack and not just a list (it is only popped, never pushed to). The algorithm comes through as under-explained. As a result, it is also not clear how original it is.
Other issues:
- Use of pre-processing in combination with doc2vec needs motivation. Le and Mikolov's original paper [15] that you cite does not seem to use similar pre-processing (they even treat special characters such as ,.!? as normal words). The choice of preprocessing techniques therefore needs explaning.
- Use of dimensionality reduction after doc2vec. As you explain, doc2vec offers "a customizable number of dimensions". So why is a separate step needed to further reduce the number of dimensions (instead of setting the wanted dimensionality directly in doc2vec)? Also, the dimensionalities before and after reduction (15 and 3) appear very low and call for explanation.
- Use of pre-determined time intervals. The user is expected to provide the number and length of each time interval as inputs to the method. But these parameters might be better extracted from the data. Although this might go beyond the scope of the present paper, the possibility should be mentioned and your choice explained.
- The alluvial diagram in Figure 2 has become very complex already with only 6 topic flows. Some of the relations seem to pass through nodes and they can be hard to discriminate from those relations that connect nodes. This solution does not seem to scale well enough to be useful in realistic cases. The paper admits this, but does not discuss mitigations or alternatives.
- The caption of Table 1 mentions both "unbalanced, and isolated" cases, but only one of them seems to appear in the table and text. The practical relevance of the "unbalanced" case needs more explanation.
Long-term stable URL for resources: It is a GitHub repository that appears complete code-wise, but I could not find the subreddit dataset on Fedora. The README.md is only three lines including the paper title.
|