Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

Spanish Triple-to-Text Benchmark on Low-Resource Large Language Models

Submitted by Virginia Ramon-... on 12/19/2025 - 02:40

Tracking #: 3993-5207

Authors:

Virginia Ramon-Ferrer

Carlos Badenes-Olmedo

Oscar Corcho

Responsible editor:

Blerina Spahiu

Submission type:

Full Paper

Abstract:

The verbalisation of structured data is a beneficial process for several applications. In the context of knowledge graphs (KGs), transforming RDF triples into natural language facilitates tasks such as KG documentation or alternative exploration methods for different user needs. While significant progress has been made on the English verbalisation of KGs, Spanish remains an under-represented language for this task due to the lack of suitable resources. This hinders developing and evaluating models capable of generating high-quality Spanish verbalisations. To tackle this problem, we create a Spanish adaptation of the WebNLG dataset, a benchmark consisting of over 45,000 verbalisations paired with DBpedia triple sets. To our knowledge, this is the first formal attempt to provide such a dataset in Spanish, which not only serves for data verbalisation but can also potentially support the automated generation of RDF triples from text. We leverage this dataset to conduct a comprehensive evaluation of resource-efficient models for the Spanish triple-to-text task employing two different learning approaches: context learning (zero-shot, one-shot, and few-shot settings) and supervised learning through partial fine-tuning. Our results highlight the challenges of generating fluent and accurate Spanish text and demonstrate that partial fine-tuning of the evaluated models significantly improves performance.

Full PDF Version:

swj3993.pdf

Previous Version:

Spanish Triple-to-Text Benchmark on Low-Resource Large Language Models

Tags:

Reviewed

Long-term Stable Link to Resources:

https://doi.org/10.5281/zenodo.15064345

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 25/Jan/2026

Suggestion:
Accept

Review Comment:

This paper presents a well-motivated and timely study on Spanish triples-to-text generation using resource-efficient large language models. The creation of the Spanish WebNLG dataset and the systematic evaluation of prompt-based and fine-tuned approaches constitute valuable contributions to multilingual data-to-text generation, particularly for underrepresented languages such as Spanish.

Strengths:

One of the main strengths of the work lies in the development of the Spanish WebNLG dataset through a semi-supervised pipeline followed by manual revision. This effort addresses a significant resource gap and enables both generation and potential reverse (text-to-triples) applications. The authors clearly articulate their research questions and design experiments that directly address them, offering a structured and coherent narrative throughout the paper.

The experimental findings convincingly demonstrate that contextualisation (one-shot prompting) and parameter-efficient fine-tuning substantially improve performance over zero-shot settings. The analysis of different models provides useful practical insights for model selection in low-resource scenarios, particularly the strong performance of Qwen2.5-1.5B-Instruct and the contrasting behaviour of Llama-3.2-1B-Instruct across languages. The multilingual and error analyses further strengthen the study by showing that linguistic properties such as morphological richness and syntactic flexibility impact both model behaviour and metric reliability.

The discussion is thorough and well-aligned with the results, and the conclusions appropriately reflect the scope of the experiments. The authors’ emphasis on language-specific evaluation and adaptation strategies is particularly relevant for multilingual NLG research.

Weaknesses:
While the dataset creation process is carefully described, the paper would benefit from a more detailed quantitative and qualitative analysis of the manual revision stage (e.g., inter-annotator agreement or explicit error categories). This would improve transparency regarding the final corpus quality and help assess the reliability of the semi-supervised pipeline. The study and the whole community would also benefit from making the "internal guidelines" for evaluation public with examples and descriptions (even on GitHub).
__________

Overall, the paper makes a solid contribution to multilingual data-to-text generation and provides practical guidance for adapting resource-efficient models to underrepresented languages. With minor improvements in evaluation, the work would be even stronger.

Review #2

By Gennaro Nolano submitted on 03/Feb/2026

Suggestion:
Accept

Review Comment:

I have checked the new version against my previous review, and I feel like the authors have addressed most of the points I had raised.

As such, I have no more issues with the paper, and I think it can be published without further changes.

Review #3

By Barbara Heinisch submitted on 10/Feb/2026

Suggestion:
Accept

Review Comment:

As mentioned in my first review, the manuscript makes an original contribution in several respects, including the use of several (not only one) evaluation metrics commonly used in the NLP community and addressing the scarcity of high-quality Spanish resources for knowledge-graph verbalisation. Since the majority of the comments from the first review were taken into account in this version (including the limitations), the current manuscript provides a more comprehensive picture.

Log in or register to post comments
1194 reads

Main menu

Editorial Board

Syndicate

Spanish Triple-to-Text Benchmark on Low-Resource Large Language Models

Tracking #: 3993-5207

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

Spanish Triple-to-Text Benchmark on Low-Resource Large Language Models

Tracking #: 3993-5207

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles