Using Wikidata Lexemes and Items to Generate Text from Abstract Representations

Tracking #: 3564-4778

Mahir Morshed

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Tool/System Report
Ninai/Udiron, a living function-based natural language generation system, uses knowledge in Wikidata lexemes and items to transform abstract representations of factual statements into human-readable text. The combined system first produces syntax trees based on those abstract representations (Ninai) and then yields sentences from those syntax trees (Udiron). The system relies on information about individual lexical units and links to the concepts those units represent, as well as rules encoded in various types of functions to which users may contribute, to make decisions about words, phrases, and other morphemes to use and how to arrange them. Various system design choices work toward using the information in Wikidata lexemes and items efficiently and effectively, making different components individually contributable and extensible, and making the overall resultant outputs from the system expectable and analyzable. These targets accompany the intentions for Ninai/Udiron to ultimately power the Abstract Wikipedia project as well as be hosted on the Wikifunctions project.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 06/Nov/2023
Review Comment:

Autho had made efforts to review the paper based on the reviewers' comments. The overall quality and clarity have been improved. I would suggest to accept it.

Review #2
Anonymous submitted on 04/Jan/2024
Review Comment:

The latest version of the paper clarifies a number of issues that were not entirely clear in the previous version. It seems that Ninai/Udiron is an ongoing effort and will be for some time, so naturally some of the elements of the paper will not be fully accurate in the future.
A couple of comments that might make the paper a bit stronger:
* It would be great if the proposed approach could be positioned within the natural language generation literature; specifically, a section in the introduction or background that would describe why the approach taken is interesting, and why other approaches, including but not limited to end-to-end translation and/or NLP, may not be appropriate for this task.
* I would suggest to add literals to Figures 1 and 2, since they are not easily decipherable the way they are now.