Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
GQ Zhang

Former/Founding Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Large Language Models for Creation, Enrichment and Evaluation of Taxonomic Graphs

Submitted by Irina Nikishina on 07/06/2025 - 08:06

Tracking #: 3921-5135

Authors:

Viktor Moskvoretskii

Irina Nikishina

Ekaterina Neminova

Alina Lobanova

Alexander Panchenko

Chris Biemann1

Responsible editor:

Guest Editors KG Construction 2024

Submission type:

Full Paper

Abstract:

Taxonomies play a crucial role in organizing knowledge for various natural language processing tasks. Recent advancements in LLMs have opened new avenues for automating taxonomy-related tasks with greater accuracy. In this paper, we explore the potential of contemporary LLMs in learning, evaluating and predicting taxonomic relations across multiple lexical semantic tasks. We propose novel method for taxonomy-based instruc- tion dataset creation, encompassing multiple graph relations. With the use of this dataset we build TaxoLLaMA, a unified model fine-tuned on datasets exclusively based on English WordNet 3.0, designed to handle a wide range of taxonomy-related tasks such as Taxonomy Construction, Hypernym Discovery, Taxonomy Enrichment, and Lexical Entailment. The experimental results demonstrate that TaxoLLaMA achieves state-of-the-art performance on 11 out of 16 tasks and ranked second on 4 other tasks. We also explore LLM ability for constructed taxonomies graph refinement and present comprehensive ablation study and thorough error analysis supported by both manual and automated technique.

Full PDF Version:

swj3921.pdf

Previous Version:

Large Language Models for Creation, Enrichment and Evaluation of Taxonomic Graphs

Tags:

Reviewed

Long-term Stable Link to Resources:

https://zenodo.org/records/15511071

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By John McCrae submitted on 25/Jul/2025

Suggestion:
Accept

Review Comment:

This paper presents the TaxoLLaMA model, a suite of instruction-tuned models based on LLaMA, trained on Princeton WordNet 3.0 to perform a wide range of taxonomy-related tasks. The authors introduce a novel dataset covering diverse graph relations and show that their model outperforms prior approaches, achieving state-of-the-art results on 11 of 16 task. This paper summarizes and extends two previous conference submissions by introducing new experiments with updated models (e.g., LLaMA 3.1), refining taxonomies using bidirectional and multi-relational strategies, resolving graph cycles, and adding deeper ablation studies and metrics like F&M.

This paper presents a comprehensive framework for taxonomy extraction as well as new datasets going beyond traditional hypernymy. The results are strong and a rigorous evaluation through ablation studies and multiple metrics is highly commendable. The are still some challenges with ambiguous hypernyms and some of the methods (such as using the NLTK ID) are questionable, and expanding the context and the prompt with examples for a WSD corpus could be a way to further improve the result.

"English WordNet" is not a resource and probably refers to Princeton WordNet, a lexical database for English developed at Princeton University. Another more recent project is the Open English Wordnet (note the lowercase 'n'), an open-source model that has released improved versions of the Princeton WordNet. The authors should explain why they chose to use outdated data.

Note the IDs used in example 2 and 4 are from NLTK and are not official IDs from any English-language wordnet project.

The authors use perplexity to assess hypernym relations, but, while this works, it may introduce bias against infrequent terms. Perplexity reflects probability scores so hypernyms that are less frequent (e.g., technical jargon, low-resource vocabulary) may be unfairly penalized, even if they are correct. Perhaps the authors could consider trying to correct for this by using the unigram frequency scores?

Minor
-----
p1 l28 *a* novel method
The paper exhibits inconsistent capitalization of technical terms that are not proper nouns. Here are several examples:
* Entity Linking
* Relation Classification
* Named Entity Recognition
* Hypernym Discovery
* Taxonomy Construction
* Taxonomy Enrichment
* Lexical Entailment
* etc. etc.
p7 l36. "id" -> "ID"
p8. Don't use fixed-width fonts for emphasis
p8 l27. ",,"
p11 l23. "Table ??"
p11 l34. "VS" -> "versus"
In several places "SoTA" should be either "SotA" or "SOTA"
The references need to be checked with the same care as the rest of the manuscript (don't rely on BibTeX!). There are many oddities such as capitalization issues or duplicate DOI/URLs.

Review #2

Anonymous submitted on 11/Aug/2025

Suggestion:
Accept

Review Comment:

In this paper authors describe TaxoLLaMA, a large language model (LLM) fine-tuned on English WordNet 3.0, designed to handle taxonomy-related tasks like Taxonomy Construction, Hypernym Discovery, Taxonomy Enrichment, and Lexical Entailment. The paper is framed by the recent advancements of using LLMs to improve the automation of organising knowledge in natural language processing tasks. The results reported in the paper indicate that TaxoLLaMA achieves state-of-the-art performance in 11 out of 16 tasks and ranks second in 4 others. In this regard, the topic dealt with and the results obtained are of interest and relevant.
In this paper, authors have significantly extended previous results presented in references [36, 37], by i) adding new experiments using different models in zero- and few-shot settings (Phi3, QWEN, etc.), ii) fine-tuning and updating TaxoLLaMA, iii) adding additional ablation study on the consistency and performance for the TaxoLLaMA by different numbers of generations and iv) adding an additional metric F&M for Taxonomy Construction Evaluation.
For the Hypernym Discovery task, TaxoLLaMA is tested using the 3 datasets of the SemEval-2018 dataset and two additional general datasets for Italian and Spanish. Results indicate that the fine-tuned TaxoLLaMa outperforms the other five models in the SOTA considered, for all five datasets.
For the taxonomy enrichment, TaxoLLaMA was tested using the Taxonomy Enrichment benchmark. Results indicate here that it outperforms all previous approaches on the WordNet Noun and WordNet Verb datasets, but falls short of the current SoTA method on more specialised taxonomies (MAG-CS and MAG-PSY).
Finally, for testing the performance of the lexical entailment task, the Hyperlex benchmark and the ANT entailment subset were used. For ANT, the results differ for the two metrics considered, since for Average Precision the proposed approach ranks better than the other four SoTA proposals considered, whilst for AUC it ranks second. For Hyperlex, TaxoLLaMA outperforms the other 5 zero-shot proposals considered for the Lexical Dataset and ranks second for the Random dataset. In both cases, it falls short of the best fine-tuned proposal (RoBERTa best [40]).
The experiments are methodologically sound and demonstrate that the proposed approach is well-suited for solving the considered taxonomy-related tasks and challenges.
Together with the paper, the authors provide links to GitHub and Zenodo repositories with the code for the paper and the dataset used. The repositories are well organised and allow the reader to assess the data, as well as reproduce the results. In this regard, the reproducibility of the reported results is guaranteed.

Review #3

By Pablo Calleja submitted on 16/Sep/2025

Suggestion:
Accept

Review Comment:

All the comments from the previous review have been handled. The overall result is a good paper with a good contribution. Moreover, the clarity of explanation has been improved and tables such as Table 2 have better presentation and details that are useful during the reading.

Review #4

Anonymous submitted on 17/Sep/2025

Suggestion:
Accept

Review Comment:

I have read the author's comments, and have seen the changes they made in the new manuscript with respect to (1) the readability of the text (formalisms) and the missing technical details, (2) publishing the dataset on zenodo and including more details for reproducibility on the Github repository, and I agreed with reviewer 3 on the 'overly broad predictions', but am satisfied with the answer that the authors give.

From my perspective, I believe the article to be of interest to the journal's special issue and would make a valuable contribution in its present state.

Log in or register to post comments
413 reads

Main menu

Editorial Board

Syndicate

Large Language Models for Creation, Enrichment and Evaluation of Taxonomic Graphs

Tracking #: 3921-5135

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Large Language Models for Creation, Enrichment and Evaluation of Taxonomic Graphs

Tracking #: 3921-5135

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles