Review Comment:
This paper presents a comparison between two Wikipedia articles: the articles on China in the English and Chinese Wikipedia chapters. Using semantic network analysis, the authors explore differences in word usage between the two pages. They discuss the results in the light of the cultural differences between the two communities.
The paper does not fit the scope of this journal. The most important argument for this is that the paper does not provide any new insights related to the field of Semantic Web (or even broader, the Web, or Computer Science, AI). What can be learned from this paper is mostly related to the cultural differences between the English and Chinees speaking parts of the World. The method that is employed in this paper, semantic network analysis, is only loosely related to the topic of the journal. This type of semantic network analysis is based on word co-occurrence matrices. Issues such as knowledge representation, data integration, modelling, reuse of (web) data, etc. do not play a role.
In addition, I feel that this paper is too lightweight for publication in this journal. This is mainly due to the fact that only two Wikipedia articles were examined, and that it is a purely observational study (in contrast to, for example, an experiment). In the remainder of this review I will give a more detailed motivation.
The motivation given for this study is that in previous work "insufficient attention has been paid to the detailed content of Wikipedia articles". It is not clear to me why this is a problem in itself. I would expect for example to hear about an unanswered question that could not be solved with the older, less detailed analyses.
The structure of the paper can be improved. The introduction is very short, and does not contain any information on the research methods (other than the general term Semantic Network Analysis). Also, the choice for the page on China as a use case is not explained. For me, it was not clear from the introduction that only two Wikipedia pages were examined. I was under the impression that all pages related to China were included (and I did wonder how this selection was made). Section 2 is a related work section (and should be named as such). However, at the end of the section new information is presented about the goal of the paper ("to map how different language speakers illustrate the meaning of a particular concept in various ways in Wikipedia"). I would move this to the introduction. Also, could you please explain what you mean by "illustrate the meaning of a concept"? Section 3 describes the method of semantic network analysis. However, again, there is also additional information about the focus of the paper ("This paper focuses on analyzing the salience of the concepts in texts.") I would move this to the introduction as well.
The explanation of the research method leaves some open questions. However, the words 'concept' and 'word' seem to be used interchangeably at some points. E.g. on p 10 in the sentence "words that occurred within seven concepts of each other": no defintion of 'concept' was given so far, so I have assumed this means something like 'seven words that are not stop words'. Also, I find the rationale for using a seven word threshold a bit unusual. Is there a reason for not using the sentence breaks? Can you give some indication of how these two options compare in terms of performance and in terms of results? Later on page 10, the paper mentions 'cells of the two networks'. I have an idea of what is meant (a cell in a matrix representation of a network), but please be explicit about this. Also, it is not clear to me how you can correlate the 'corresponding cells of the two networks, given that the two networds not necessarily have the same nodes.
The discussion of the results is interesting and makes sense. I like the fact that the authors refer back to individual sentences from the Wikipedia pages to illustrate/prove their point. However, as mentioned above, the discussion does not contain any new insights related to the Semantic Web. Research question is not discussed at all.
Specific comments:
p5 the top 30 people in each language version -> ranked based on what? Nr. of edits? Length of the page? Nr. of views? Centrality of the node?
p4 one-tenth of one percent is comprised of common concepts -> Can you explain what "common concepts" are in this case? And why not just write 0.1%?
p5 Although this research was significant in using advancing algorithm models -> I don't agree that the use of advanced algorithms is a reason to call a study significant.
p6 from a particular semantic context -> Could you please explain what you mean by "from a particular semantic context?"
p7 clusters that composing the semantic networks -> please rephrase
p9 The axis labels on Fig 1 can be improved (e.g. mentioning the contributors and number of edits). Also, I don't see the advantage of normalization here. Why not just a log scale?
p12 In the caption of table 4, should this be "ordered by Greatest Normalized Eigenvector Centralities", instead of "with Great...."?
|