Review Comment:
This paper presents a well-motivated and timely study on Spanish triples-to-text generation using resource-efficient large language models. The creation of the Spanish WebNLG dataset and the systematic evaluation of prompt-based and fine-tuned approaches constitute valuable contributions to multilingual data-to-text generation, particularly for underrepresented languages such as Spanish.
Strengths:
One of the main strengths of the work lies in the development of the Spanish WebNLG dataset through a semi-supervised pipeline followed by manual revision. This effort addresses a significant resource gap and enables both generation and potential reverse (text-to-triples) applications. The authors clearly articulate their research questions and design experiments that directly address them, offering a structured and coherent narrative throughout the paper.
The experimental findings convincingly demonstrate that contextualisation (one-shot prompting) and parameter-efficient fine-tuning substantially improve performance over zero-shot settings. The analysis of different models provides useful practical insights for model selection in low-resource scenarios, particularly the strong performance of Qwen2.5-1.5B-Instruct and the contrasting behaviour of Llama-3.2-1B-Instruct across languages. The multilingual and error analyses further strengthen the study by showing that linguistic properties such as morphological richness and syntactic flexibility impact both model behaviour and metric reliability.
The discussion is thorough and well-aligned with the results, and the conclusions appropriately reflect the scope of the experiments. The authors’ emphasis on language-specific evaluation and adaptation strategies is particularly relevant for multilingual NLG research.
Weaknesses:
While the dataset creation process is carefully described, the paper would benefit from a more detailed quantitative and qualitative analysis of the manual revision stage (e.g., inter-annotator agreement or explicit error categories). This would improve transparency regarding the final corpus quality and help assess the reliability of the semi-supervised pipeline. The study and the whole community would also benefit from making the "internal guidelines" for evaluation public with examples and descriptions (even on GitHub).
__________
Overall, the paper makes a solid contribution to multilingual data-to-text generation and provides practical guidance for adapting resource-efficient models to underrepresented languages. With minor improvements in evaluation, the work would be even stronger.
|