Vecsigrafo: Corpus-based Word-Concept Embeddings - Bridging the Statistic-Symbolic Representational Gap in Natural Language Processing

Tracking #: 2074-3287

José Manuel Gómez-Pérez
Ronald Denaux

Responsible editor: 
Guest Editors Semantic Deep Learning 2018

Submission type: 
Full Paper
The proliferation of knowledge graphs and recent advances in Artificial Intelligence have raised great expectations related to the combination of symbolic and distributional semantics in cognitive tasks. This is particularly the case of knowledge-based approaches to natural language processing as near-human symbolic understanding rely on expressive, structured knowledge representations. Engineered by humans, such knowledge graphs are frequently well curated and of high quality, but at the same time can be labor-intensive, brittle or biased. The work reported in this paper aims to address such limitations, bringing together bottom-up, corpus-based knowledge and top-down, structured knowledge graphs by capturing as embeddings in a joint space the semantics of both words and concepts from large document corpora. To evaluate our results, we perform the largest and most comprehensive empirical study around this topic that we are aware of, analyzing and comparing the quality of the resulting embeddings over competing approaches. We include a detailed ablation study on the different strategies and components our approach comprises and show that our method outperforms the previous state of the art according to standard benchmarks.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Dec/2018
Review Comment:

In my view, the current version of the paper can be accepted

Review #2
Anonymous submitted on 08/Jan/2019
Review Comment:

Thank you to the authors for their answers. The paper has been improving since the first version, and I think the quality is certainly enough to have it accepted. There are a few small suggestions I would like to make to improve the final version if it is still possible (not essential but I think it could help).

Still the evaluation looks a bit messy to me and not easy to follow (although this version has certainly improved), I think the presentation of the results can be improved. Anyway, the Discussion section makes a good job highlighting important points in a clear way.

Also, as a nice touch to complement the evaluation, given the comments at the end of Section 5.1.1, it could be interesting to create the similarity dataset with concept embeddings that the author mention (even if it is relatively small). It could strengthen the argument that the evaluation is not focused on senses or concepts, but rather on words. This way it could be interesting to have a small analysis on the results on this dataset, and seeing if they correlate with the rest of the datasets.

Review #3
By Dagmar Gromann submitted on 22/Jan/2019
Minor Revision
Review Comment:

Thanks a lot for addressing all previous comments. The proposed approach for joint word and concept embeddings provides an interesting training method integrating corpora and knowledge graphs. Results are promising and seem to outperform word embeddings on the word similarity and partially also on a relation prediction task. The latter comes somewhat surprising since the inclusion of concept information would presumably improve performance on exactly such tasks as relation prediction consistently. While the method is not extremely innovative, the extensive evaluation of factors and similar approaches provides for highly interesting results. As regards mode of presentation, the manuscript is of a high quality even though compliance with style guide, consistency of notation and style, and terminology classification as well as support for some claims raised can be improved upon and are detailed below.

Comments by section:
Section 2.1.:
- the training methods are not described correctly: word2vec consists of two different tasks - one where indeed center word is predicted using its context words but the second one predicts the context given the center word (the more common skip-gram method)
- GloVe takes relational probabilities instead of simple raw counts as is common in the co-occurrence matrix - please specify this
- NASARI seems to produce embeddings only for nouns? You mean you did not even take the time to open the file and check that it also trains for noun phrases, verbs, and even phrasal verbs?
- "inter-agreement between different vector spaces" becomes only clear later on in the paper - maybe make this statement more clear already on page 4?

- why is word2vec suddenly written as code? what you call "focus" word has established itself as center word (so you train center and context word embeddings with conventional methods). For the sake of compliance with literature, I encourage you to take up this terminology

- which distance metric is the provided distance?
- the window size should be quantified somewhere

- glossa and syncons have not been introduced - should be explained somewhere

- to cater towards different reader groups, could you please clarify what an ablation study is in writing (preferably already in the introduction)
- 4.1. "but various other lexical and semantic relations are also included" => this is also the case with WordNet even though not many - rephrase
- 4.2. what is superverbum and supernomen - please introduce that Sensigrafo vocabulary somewhere
- "All the alternative disambiguation methods studied..." - could you explain which methods you studied and provide their results (maybe in an appendix) to support this claim or a reference if published elsewhere
- "UMBC, since these perform consistently worse than the remaining embeddings" - quantify to support this claim or provide a reference where this is shown?
- Table 7: how come there are such huge differences in number of epochs? What if this difference was reduced? Would that affect performance? Please comment on this massive difference somewhere (only done for GloVe somewhere later on)
- Table 10: could you order the results by corpus or in a way that is intuitive with the discussion of the results

Style guide:
Please ensure that your submission corresponds to the preparation guidelines for camera-ready papers of the Semantic Web Journal, which currently it does not. For instance, authors should be provided with full names, sections, figures, and tables in running text when referencing a specific instance need to be capitalized (Table 8 not table 8), no page numbers, etc. We also recommend consulting a native English speaker proficient in reviewing scientific writing for a final editing process to avoid mistakes such as the ones below.

Minor comments in order of appearance:
understanding rely on expressive => relies
e.g. [5] are particularly => e.g. word2vec [5]
Word2Vec or encoded word2vec => should always be word2vec
among the firsts => first few
as [24] points out => as Camacho-Colladas et al. [24] point out (please correct everywhere, e.g. page 3 [33] and [34])
the size of the corpus greatly affect => affects
pre-calculated embeddings => usually called pretrained
char n-grams => character n-grams
is the focus words and x_j => word
how big this can impact results => big should be much
I.e. can never be used at the beginning of a sentence (also e.g. could not) - please change everywhere in the paper
an increase performance => increased
First is lemma-concept => The first is lemma-concept
Please use 1,519 instead of 1519 as a separation marker to ease reading numbers in the paper
co-trained lemmas and concepts produces => produce
hyper parameters => hyperparameters
for word-prediction task => for the word-prediction task
quality as those derived => quality than those derived