Review Comment:
This paper presents the TaxoLLaMA model, a suite of instruction-tuned models based on LLaMA, trained on Princeton WordNet 3.0 to perform a wide range of taxonomy-related tasks. The authors introduce a novel dataset covering diverse graph relations and show that their model outperforms prior approaches, achieving state-of-the-art results on 11 of 16 task. This paper summarizes and extends two previous conference submissions by introducing new experiments with updated models (e.g., LLaMA 3.1), refining taxonomies using bidirectional and multi-relational strategies, resolving graph cycles, and adding deeper ablation studies and metrics like F&M.
This paper presents a comprehensive framework for taxonomy extraction as well as new datasets going beyond traditional hypernymy. The results are strong and a rigorous evaluation through ablation studies and multiple metrics is highly commendable. The are still some challenges with ambiguous hypernyms and some of the methods (such as using the NLTK ID) are questionable, and expanding the context and the prompt with examples for a WSD corpus could be a way to further improve the result.
"English WordNet" is not a resource and probably refers to Princeton WordNet, a lexical database for English developed at Princeton University. Another more recent project is the Open English Wordnet (note the lowercase 'n'), an open-source model that has released improved versions of the Princeton WordNet. The authors should explain why they chose to use outdated data.
Note the IDs used in example 2 and 4 are from NLTK and are not official IDs from any English-language wordnet project.
The authors use perplexity to assess hypernym relations, but, while this works, it may introduce bias against infrequent terms. Perplexity reflects probability scores so hypernyms that are less frequent (e.g., technical jargon, low-resource vocabulary) may be unfairly penalized, even if they are correct. Perhaps the authors could consider trying to correct for this by using the unigram frequency scores?
Minor
-----
p1 l28 *a* novel method
The paper exhibits inconsistent capitalization of technical terms that are not proper nouns. Here are several examples:
* Entity Linking
* Relation Classification
* Named Entity Recognition
* Hypernym Discovery
* Taxonomy Construction
* Taxonomy Enrichment
* Lexical Entailment
* etc. etc.
p7 l36. "id" -> "ID"
p8. Don't use fixed-width fonts for emphasis
p8 l27. ",,"
p11 l23. "Table ??"
p11 l34. "VS" -> "versus"
In several places "SoTA" should be either "SotA" or "SOTA"
The references need to be checked with the same care as the rest of the manuscript (don't rely on BibTeX!). There are many oddities such as capitalization issues or duplicate DOI/URLs.
|