Taxonomy Enrichment with Text and Graph Vector Representations

Tracking #: 2847-4061

Authors: 
Irina Nikishina
Mikhail Tikhomirov
Varvara Logacheva
Yuriy Nazarov
Alexander Panchenko
Natalia Loukachevitch

Responsible editor: 
Guest Editors DeepL4KGs 2021

Submission type: 
Full Paper
Abstract: 
Knowledge graphs such as DBpedia, Freebase or Wikidata always contain a taxonomic backbone that allows the arrangement and structuring of various concepts in accordance with the hypo-hypernym ("class-subclass") relationship. With the rapid growth of lexical resources for specific domains, the problem of automatic extension of the existing knowledge bases with new words is becoming more and more widespread. In this paper, we address the problem of taxonomy enrichment which aims at adding new words to the existing taxonomy. We present a new method that allows achieving high results on this task with little effort. It uses the resources which exist for the majority of languages, making the method universal. We extend our method by incorporating deep representations of graph structures like GCN, Poincaré embeddings, node2vec etc. that have recently demonstrated promising results on various NLP tasks. Furthermore, combining these representations with word embeddings allows us to beat the state of the art. We conduct a comprehensive study of the existing approaches to taxonomy enrichment based on word and graph vector representations and their fusion approaches. We also create a number of datasets for taxonomy extension for English and Russian. We achieve state-of-the-art results across different datasets and provide an in-depth error analysis of mistakes.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Aug/2021
Suggestion:
Accept
Review Comment:

The authors address the problem of taxonomy enrichment which aims at adding new words to an existing taxonomy. Deep representations of graph structures like GCN autoencoder, Poincaré embeddings, node2vec have recently demonstrated very promising results on various NLP tasks. The authors' approach is a comprehensive study of the existing approaches to taxonomy enrichment based on word and graph vector representations. They also explore the ways of using deep learning architectures to extend taxonomic backbones of knowledge graphs. They achieve state-of-the-art results across different datasets.

(1) Originality

The work is fairly original. The authors present a computational study of various approaches
to taxonomy enrichment including recent
state-of-the-art results, datasets for studying diachronic evolution of wordnets for English and Russian, and efficient methods for taxonomy enrichment.

(2) Significance of the results

I am divided on the significance of the results/potential impact. While their study shows a few percentage point improvements, the combinations used are fairly simple and expected. That being said, I still think this is a result that deserves to get published.

(3) Quality of writing

The quality of writing has improved since my original review. I am satisfied with the paper as it now stands.

Review #2
Anonymous submitted on 13/Oct/2021
Suggestion:
Minor Revision
Review Comment:

The paper has been considerably revised and authors have done a very good job. The paper is much more understandable and more easy to follow and to understand. However, the paper is still very long (now it is 31 pages long) and some parts could be improved for better readability.
About the organization of the paper, it is quite clear that the results presented in pages 18, 19, 22, and 23, could be presented in an appendix. At the moment, these results are in the middle of explanations and quite badly located.
Figure 7 is not very useful as well.
About the content, some improvements could be made in the introduction, and in some other parts which are discussed below.

In the introduction, authors are insisting on knowledge graphs, but they quite never work with knowledge graphs, and here they are more interested in "taxonomy" enrichment. As a "taxonomy" is defined as a tree-based structure, what is the relation with knowledge graphs which are based on graphs?
Moreover, a discussion with the relation existing here between the "taxonomy" model and a representation model such as SKOS could also be discussed. SKOS is probably the closer model to the "taxonomy" model of the authors.
Authors never discuss the "semantics" of the word structures that they consider, such as WordNet for example. What is exactly the semantics of the subclass relation in such models? Do we have a logic-based interpretation in terms of inclusion of the class extensions as in DL-based representation models? (DL = description logics).
This is probably one main problem as the search for synset remains something quite difficult and does not seem to rely on a partial ordering that could be induced by the class extension inclusion relation. A clarification about that would be welcome.
Authors should also be aware that there is work about the management of multiple inheritance hierarchies and that a class may inherit from several other classes and there are ways of computing the list of ancestors in case of multiple parent classes (based on a linearization of the parent graph, see work in object-oriented programming for example).

The definition of a knowledge graph in page 3 is not correct as we do not know what is a knowledge base. Roughly speaking, a knowledge base is something that is constituted by a TBox (concepts and relations) and an ABox (individuals, concept and relation instantiations). Authors should be much more clear about that.
In some parts, the use of articles is a bit weird and authors should pay more attention to English in such parts (especially in section 2.2 and some others).

Explain what you mean exactly by "diachronic". The word is used but not properly introduced.

In section 6, authors introduce the DWRank-graph method in a very confusing way. They list a lot of different methods and the reader does not understand what is happening there and why we have at once so many methods which are described. Is this related to state of the art? Authors should be much more clear and explain what they are intending to do here and why they describe these methods.

In the same way, explain more precisely why you are calling the method introduced in section 7 as DWRank-Meta, why using Meta? This can be confusing and misleading.

In section 8 about experiments, authors should take the time to explain how their figures (2-6) should be read and interpreted, what are the good and the bad values of MAP.
We already indicated above the possible reorganization of the section.

in page 20 demoNstrate and in page 21 theY (do provide)

Review #3
Anonymous submitted on 18/Oct/2021
Suggestion:
Accept
Review Comment:

The paper presents a comprehensive study of several approaches for taxonomy enrichment and an error analysis describing their typical mistakes. To this purpose, the authors introduce two new datasets for this task extracted by WordNet (in English and Russians) and evaluate alternative solutions based either on word representations or graph representations

The authors did a good job of addressing the issues that I mentioned in the previous review and the new version of the paper is much more robust. I am happy for it to be accepted.