Wan2Vec: Embeddings Learned on Word Association Norms

Tracking #: 1963-3176

Authors: 
Gemma Bel-Enguix
Helena Gomez-Adorno
Jorge Reyes-Magaña
Gerardo Sierra

Responsible editor: 
Guest Editors Knowledge Graphs 2018

<
Submission type: 
Full Paper
Abstract: 
Word embeddings are powerful for many tasks in natural language processing. In this work, we learn word embeddings using weighted graphs from Word Association Norms with the node2vec algorithm. The computational resources used by this technique are reasonable and affordable, which allows us to obtain good quality word embeddings even from small corpus. We evaluate our word vectors in two word similarity benchmarks, the WordSim-353, MC30, MTurk-287, MEN-TR-3k, SimLex-999, MTurk-771 and RG-65, achieving better results than those obtained with word2vec, GloVe, and FastText, trained on huge corpus.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 08/Aug/2018
Suggestion:
Minor Revision
Review Comment:

The paper introduces Wan2Vec, an application of the Node2Vec to learn word representations from word association norms. Two weight functions are tested. Results show that the wan embeddings obtain competitive or better performance than classic pre-trained embeddings.

The paper is well written and easy to follow. Some details though are not explicitly mentioned and should be clarified. For example, I have not noticed any mention about which similarity measure is used (vector cosine?). It has been demonstrated by numerous studies that vector cosine is biased and other measures might improve performance. I suggest to introduce at least one of these experiments in the paper, with some comments about why certain embeddings may work better than others with some given measures. My intuition is that being a graph-based vector, rank based measures (such as APSyn or Spearman Rho) might help to further increase the performance.

Except for minor performance improvements, I am not very sure about the originality of the paper. It is somehow expected that embeddings trained on word association norms and tested on similarity and relatedness perform well. I don’t even think this is the first time this was tested. If other similar approaches were different, this must be mentioned in the related work.

I believe the authors have to clearly identify the contribution and the limitation of this approach. For example, they may add some extrinsic evaluation, such as sentiment analysis or other tasks. I also suggest to get deeper in the intrinsic evaluation, further commenting on the relatedness/similarity and distance/rank distinctions. Also, it is fundamental to better discuss the behavior of words that are not in the norms: I am not convinced by the argument that human beings use few words (in fact their competence is much larger than what they actually use in the daily conversation).

The authors claim in the abstract that word association norms are “reasonable and affordable”. This might be relatively true for English (and few other languages), while I think such statement cannot be generalized to less studied languages. I strongly suggest to reformulate the statement.

The paper is a bit poor in references. There have been a lot of studies on similarity, relatedness and measures in the last few years that should be mentioned in a journal paper.

Add your model (maybe with 300 dims) in Table 4 for comparison. The reader should not go up and down to find the scores in table 3.

Typos:
- NAP: what is it?

Review #2
By Thamme Gowda submitted on 09/Sep/2018
Suggestion:
Major Revision
Review Comment:

Originality and Significance:
- Using Word Associative Norms is an innovative approach to construct word embeddings.
- the proposed method of construction of word embeddings outperforms other widely used methods such as Word2Vec, Glove, and Fast Text.
- the proposed method is computationally efficient. Its easier to train.

Needed Corrections:
- Page 2, right column, end of 2nd paragraph, says: “In Section 4, we present the evaluation of the generated vectors using standard data sets for word similarity in Spanish.” 
However, there is no connection to “Spanish” The dataset used is EAT which is English and evaluations are also English. So the mention of Spanish is confusing.
- Page 2, right column, last paragraph, says: “the Small World of Words deals with nine different languages.” However, the footnote link https://smallworldofwords.org/en/project has 14 languages. maybe: “the Small World of Words contained datasets in 14 languages at the time of writing”
- Page 3, left column, footnote 3 http://www.eat.rl.ac.uk/ is a broken link. You may want to take it down or use a link from internet archive: https://web.archive.org/web/20161030032628/http://www.eat.rl.ac.uk/
- Page 3, right column, top paragraph says: “Their age ranged from 17 to 22.64% of the participants were males and 36% females”. Its ambiguous since a white space is missing between 22 and 64%. Maybe this rephrasing should fix: “The participants were age between 17 and 22, among which 64% were males and 36% were females”.
- Page 3, right column, bottom section. Edge weight is defined as φ : E → R, i.e. φ : (v_i , v_j) → R . However, the notation used for defining Frequency and Association Strength is not the same, which makes it hard for the readers.
- Page 4 left column, section 3.1 : The authors refers to 4 parameters of node2vec : p, q, d, and l. They do a thorough analysis of d and l (later in the results section) however it is not clear what values were chosen for p and q. Please describe the values for p and q.
- Page 5, Figure 1: The header is stated as “weight=Association” and “weight=Frequency“. However as told in the writing (section 4, paragraph 2), these weights are “Inverse association strength” and “Inverse Frequency”. Please make it consistent.
- Page 5, left column: Authors mentioned about the overlap between Wan2Vec vocabulary and the test datasets. However, it is not clear what exactly is done to the words that were not seen in vocabulary (out-of-vocabulary, aka OOV) words. Are those excluded from test scores or assigned with 0 score?
- Table 1 and Table 2: Note that both tables share the same caption “The WAN graph was built using the inverted frequency as weighting function.” From the writing, I can see that table1=IF and Table2=IAS. (It will be a favor to the readers, if authors can match the order between IAS and IF in Figures and Tables; currently in figures, IAS is before IF and in Table it is opposite)
- Table 3: Where are the footnotes text for 8, 9, 10 and 11?
- Table 4: What is the meaning of n(overlap) column here? Is it still the same as overlap with Wan2Vec (The numbers look same as table 1 and 2) ? What is the n(overlap) for Glove, Word2Vec and FastText?

Review #3
Anonymous submitted on 10/Sep/2018
Suggestion:
Major Revision
Review Comment:

Originality (4/5): The representation of Word Association Networks (WANs) using the node2vec algorithm appears to be novel and WANs intuitively carry additional semantic information not easily inferred using algorithms focused on co-occurrence. This has potential implications for the amount of training data needed to train embeddings and the richness embeddings learned from WANs when they are available.

Significance of Results (2/5): The comparison with pre-trained embeddings is incomplete. Results using learned embeddings from WANs are selected from experiments using varying vector dimensions, where pre-trained embedding comparison results are considered only using a single dimension. A fair comparison would be to fix the dimension parameter (to 50 or 100) and compare those results; it isn't clear if there would be performance gains over other approaches which don't require labor-intensive WANs and can be used off-the-shelf for various languages. It is also unclear how out-of-vocabulary words affect experimental results or downstream tasks - this seems like a weakness of using WANs that isn't adequately addressed and compared to other approaches. The training time and efficiency of wan2vec is also cited as a major advantage but comparisons are made with pre-trained, readily-available embeddings.

Quality of Writing (3/5): The paper is well-organized and fairly easy to follow. There are a few minor typos and grammatical errors. The comparison between results from pre-trained embeddings and wan2vec is difficult and requires going back and forth between tables. Table 2 caption is same as Table 1. The n(overlap) column has the same values in all tables.