Review Comment:
Review
In this paper, the authors present an approach for detecting the emergence of new topics. The authors argue that common NLP approaches are limited, but if they are integrated with network-based approached it is indeed possible to forecast the emergence of new topics in the following years.
Although the approach is fairly good, interesting, and timely, their narrative requires extensive rework before being worth of acceptance. Many parts of the paper are confusing, and I had to read paragraphs multiple times before being able to picture the process in my mind. Not straightforward.
I will highlight in what follows some of the weaknesses more in detail.
The abstract does not sell properly. Indeed, you mention you select 20 FOS from MAG.
How different from each other? Are they all in one giant field or they are quite spread over the whole domain of science? First, you say you provide a topic evolution method for predicting the emergence of new topics, then you selected 20 fields of study. Is your predictor going to detect the emergence of only those 20 topics? There is ambiguity.
In general, these questions are cleared, somehow, later in the paper, but since our reading process starts from the title toward the conclusion, it would be nice to be on the same page from the beginning. Can you add some more details?
In general, your work is very interesting and timely, however, it does fully compare with the state of the art. You propose a *very* similar approach to Salatino et al [1] but you do not compare how you differentiate from them.
In the introduction, you state “twenty topic networks were generated”. So this is 20 networks in 20 years? Or 1 for each topic? The dataset has been verticalized in certain areas? How many papers did you analyse at the end?
Did you analyse whether there was a large overlap between the datasets? This can explain why the classifier retains high classification accuracy.
In related words, you state current approaches for identifying the evolution of topic, and such phase is enabled once we have a clear idea on how to identify/extract topics from a corpus. The same research team of above developed an interesting approach [2] that you might want to look at.
In section 3.2 you state “where topics in year y are classified as new or old…”. Why do you need so? I mean this is clarified later, but not clear at this stage why. The same applies to the state of a node. Why do you need to compute the state of a node?
Later in the same paragraph, you talk about the more prominent topics. What are the top 100 topics with the largest number of nodes? If a node represents a topic, I am reading that sentence as 100 topics with the largest number of topics. Not sure what exactly it means.
In section 3.3 you talk about pairs K*(K-1). Should it be (K*(K-1))/2 to avoid duplicate pairs?
In section 4 you show more details about the experiments. In particular, as topics, you use Microsoft’s Fields of Study. However, I would like to point out two of the drawbacks of using FOS.
----- First:
I think there isn't a report showing the performance (precision, recall, F-measure) of concept tagging done by Microsoft. We work a lot with MAG and on several occasions, we found inconsistencies. My suggestion is until Microsoft provides a sort of estimations of their algorithms, it is better to use FoS just at the first level (Computer Science, Medicine, Mathematics, Economics and so on). After all, FoS is very granular, and it is pretty normal to have misclassifications when you have more than half a million entities.
----- Second:
Papers have been tagged retroactively with concepts in FoS! So, the year the topic is first used in the dataset fy, is misleading. Indeed, Salatino et al. [1] use the author’s prompted keywords as they reflect more the status of science at that given time. Just to give you an actual example, I checked in my version of MAG and the topic Semantic Web (which we all know emerged around the beginning of this century) appears first in 1920, because Microsoft tagged https://www.journals.uchicago.edu/doi/abs/10.1086/360262 with fos ‘semantic web’.
In table 2, you list the 20 FOS. It is not clear. Are those 20 FOS the seed topics to extract all papers tagged with them?
But what are the final FOS analysed? Just those 20 or all the FOS available in all papers extracted using those 20 seed topics?
Here there is another issue with identifying the year of the first usage. Identifying the year in which a topic is firstly used from those vertical datasets might be higher than the actual year of the first usage from the whole dataset. This is because you are leaving out some papers that can be tagged with that topic and potentially be published before the identified year.
In section 4.3, you state: “Training size is set to t=9 as the increase”. What t stands for?
Then, in section 4.4, you state “This results in a total of 380 pairs for FoS used in the experiment”. Should it be 190? All possible unique pair from 20 items is 190. See above formula.
In table 5, you state decimal values as TP, FP and FN. This is very odd. TP should be a natural number as it counts the number of true instances that are actually true. Same for FP and FN.
Finally, I would expect to find a dedicated section for the gold standard, explaining how you built your gold standard, with all details.
In general, I find this work really interesting, however, before recommending it for its acceptance, I would like to see an effort from the authors in extensively rewriting this paper.
References:
[1] Salatino, Angelo A., Francesco Osborne, and Enrico Motta. "AUGUR: forecasting the emergence of new research topics." Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 2018.
[2] Salatino, Angelo; Osborne, Francesco; Thanapalasingam, Thiviyan and Motta, Enrico (2019). The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly Articles. In: TPDL 2019: 23rd International Conference on Theory and Practice of Digital Libraries, Lecture Notes in Computer Science, Springer, pp. 296–311.
|