Review Comment:
Thanks a lot for addressing all previous comments. The proposed approach for joint word and concept embeddings provides an interesting training method integrating corpora and knowledge graphs. Results are promising and seem to outperform word embeddings on the word similarity and partially also on a relation prediction task. The latter comes somewhat surprising since the inclusion of concept information would presumably improve performance on exactly such tasks as relation prediction consistently. While the method is not extremely innovative, the extensive evaluation of factors and similar approaches provides for highly interesting results. As regards mode of presentation, the manuscript is of a high quality even though compliance with style guide, consistency of notation and style, and terminology classification as well as support for some claims raised can be improved upon and are detailed below.
Comments by section:
Section 2.1.:
- the training methods are not described correctly: word2vec consists of two different tasks - one where indeed center word is predicted using its context words but the second one predicts the context given the center word (the more common skip-gram method)
- GloVe takes relational probabilities instead of simple raw counts as is common in the co-occurrence matrix - please specify this
- NASARI seems to produce embeddings only for nouns? You mean you did not even take the time to open the file and check that it also trains for noun phrases, verbs, and even phrasal verbs?
- "inter-agreement between different vector spaces" becomes only clear later on in the paper - maybe make this statement more clear already on page 4?
3.1.
- why is word2vec suddenly written as code? what you call "focus" word has established itself as center word (so you train center and context word embeddings with conventional methods). For the sake of compliance with literature, I encourage you to take up this terminology
3.2.
- which distance metric is the provided distance?
- the window size should be quantified somewhere
3.3.
- glossa and syncons have not been introduced - should be explained somewhere
4.
- to cater towards different reader groups, could you please clarify what an ablation study is in writing (preferably already in the introduction)
- 4.1. "but various other lexical and semantic relations are also included" => this is also the case with WordNet even though not many - rephrase
- 4.2. what is superverbum and supernomen - please introduce that Sensigrafo vocabulary somewhere
- "All the alternative disambiguation methods studied..." - could you explain which methods you studied and provide their results (maybe in an appendix) to support this claim or a reference if published elsewhere
- "UMBC, since these perform consistently worse than the remaining embeddings" - quantify to support this claim or provide a reference where this is shown?
- Table 7: how come there are such huge differences in number of epochs? What if this difference was reduced? Would that affect performance? Please comment on this massive difference somewhere (only done for GloVe somewhere later on)
- Table 10: could you order the results by corpus or in a way that is intuitive with the discussion of the results
Style guide:
Please ensure that your submission corresponds to the preparation guidelines for camera-ready papers of the Semantic Web Journal, which currently it does not. For instance, authors should be provided with full names, sections, figures, and tables in running text when referencing a specific instance need to be capitalized (Table 8 not table 8), no page numbers, etc. We also recommend consulting a native English speaker proficient in reviewing scientific writing for a final editing process to avoid mistakes such as the ones below.
Minor comments in order of appearance:
understanding rely on expressive => relies
e.g. [5] are particularly => e.g. word2vec [5]
Word2Vec or encoded word2vec => should always be word2vec
among the firsts => first few
as [24] points out => as Camacho-Colladas et al. [24] point out (please correct everywhere, e.g. page 3 [33] and [34])
the size of the corpus greatly affect => affects
pre-calculated embeddings => usually called pretrained
char n-grams => character n-grams
is the focus words and x_j => word
how big this can impact results => big should be much
I.e. can never be used at the beginning of a sentence (also e.g. could not) - please change everywhere in the paper
an increase performance => increased
First is lemma-concept => The first is lemma-concept
Please use 1,519 instead of 1519 as a separation marker to ease reading numbers in the paper
co-trained lemmas and concepts produces => produce
hyper parameters => hyperparameters
for word-prediction task => for the word-prediction task
quality as those derived => quality than those derived
|