Review Comment:
The work in this paper proposes an unsupervised approach for Arabic sentiment analysis at the word- and sentence-level. The proposed approach relies on the co-occurrence of word bigrams in tweets in order to extract the contextual semantics and sentiment similarity between these words. Evolution is conducted on a set of 4.5 million Arabic tweets. Qualitative analysis is done on a very few samples of the words and sentences in the dataset to compare the performance of the proposed approach against two context-based sentiment detection baselines.
Strengths: An unsupervised approach for sentiment analysis of informal Arabic texts.
Weaknesses
Contribution and Novelty:
The use of word co-occurrences for sentiment analysis has been extensively studied in previous works (some cited in the Section 2 in the paper). The proposed approach does not introduce any major addition to current state-of-the-art, which makes the contribution and the novelty of this work very limited.
Methodology:
- Although the proposed approach is simple and straight forward, there are several places where the certain parameters/aspects of the approach look ad-hoc. For example, its unclear why the authors choose to remove words that have a co-occurrences frequency of less than 400? More importantly, removing these words shrink the vocabulary size by 84%, as described in Section 3.4. In my opinion, such random and unconstrained reduction method would results in removing many contextual and/or opinionated words from the vocabulary which in turn would lead to affect the sentiment analysis performance.
- Section 3.7: What do the components in the vector representation of words represent? e.g., what do 1,1 and 5 in the vector of the word “love” refer to?
- I understand that you tried several numbers of clusters for K-means, but it’s unclear how you decide that 200 is the best choice!
- Evaluation:
No proper evaluation of the proposed approach is conducted in this paper. The current evaluation is done by manually comparing the output of the proposed approach against the baselines. To this end, very few samples of the output were analysed, which does not give a clear idea about the performance of the proposed approach.
- Scalability:
Several reduction methods are applied on data/matrices in this paper in order reduce the computational cost of the proposed approach. Regardless of whether such kind of reduction is sound or not, this suggest that the scalability of your approach is very limited.
Presentation
The paper is not very well structured and contains many typos. Also, equations are presented in the paper as images. I suggest the authors to follow the standard latex syntax for that in order to make the presentation clearer.
Overall, although this paper address an important problem in the sentiment analysis area, the proposed approach lacks novelty. Also, the evaluation conducted in this work is very limited.
|