Toward a Semantic Framework for Han Poetry: Multilingual and Decentralized Integration of East Asian Literary Heritage

Tracking #: 3901-5115

Authors: 
Huang Yanjie
Jiang Hui
Chen Tao
Sha Tianming

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
Han poetry, a classical form composed in literary Chinese, originated in China and spread across East Asia, including Japan, Korea, and Vietnam. It has played a key role in elite education, cultural diplomacy, and political communication. Despite its cultural significance, research on its semantic modeling remains limited, and digital representations often face issues like fragmentation and structural inconsistencies.This paper proposes a decentralized semantic infrastructure for Han poetry as a multilingual resource for cultural heritage. The system is organized into three layers: (1) multilingual SKOS vocabularies to represent poetic concepts; (2) OWL ontologies that reuse BIBFRAME for bibliographic structure and FOAF for authorship representation; and (3) a distributed RDF resource layer supporting SPARQL 1.1 federated queries across poetic corpora.The infrastructure implements TEI standards for structuring poetic materials, including metadata and digitized text, while reserving interfaces for potential IIIF integration. Future work will focus on expanding multilingual vocabularies and integrating with cultural heritage platforms.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 12/Nov/2025
Suggestion:
Minor Revision
Review Comment:

The work addresses a timely issue in cross-cultural, multilingual classical Chinese poetry databases by designing an ontology that accounts for two languages and the nebulous concepts linked to the poetic use of language and the references to historical events or figures, and literary examples.

Although this paper did exhibit a certain level of originality, this paper could benefit from three aspects of revision. First, the use of the term, Han poetry, is confusing, especially in the multilingual context. Although in a Japanese context Han poetry is a viable term, it doesn't apply to other languages. For instance, in Chinese, Han poetry can mean poetry from the Han Dynasty, such as Yue Fu poetry, or it can mean the poetry from the Han ethic group. Either way, it distracts the reader from the authors' intended meaning - Chinese Classical poetry. Understandably, the authors wanted to avoid using 'Chinese' in this term, but using Han poetry adds even more confusion. The database mostly draws from Tang and Heian poetry, so an alternative term should be used to clarify the scope of this database. Second, it is unclear how the proposed framework accounts for the conceptual changes across languages. A certain concept, using an inappropriate example, the moon, may evoke different feelings in Chinese and Japanese readers. These 'lost in translation' cases should also be accounted for in the ontology design. Third, the authors should be clearer about how this proposed framework is different from or improves on earlier works, such as references 34 and 35.

The supplementary resources are published on GitHub with a clear directory structure and descriptions, guaranteeing long-term use.

Review #2
By Cristiano Longo submitted on 05/Feb/2026
Suggestion:
Minor Revision
Review Comment:

This paper describes an effort for cataloguing Han Poetry resources with envisaging a layered semantically-enhanced architecture for this task. This contribution is relevant in modeling also thematic and physical aspects of Han Poetry works. The paper is well written and well structured.

My concerns are chiefly about the distribution of the produced datasets. They are accessible through a dedicated online repository. The OWL ontology http://example.org/HanpoetryOWL is provided at two different paths: Heian_db/Ontology/HanpoertyOntology.ttl and Tang_db/Ontology/HanpoetryOntology.ttl. Similarly for SKOS vocabularies, divided in core and local (what is the difference?). All the items are in a http//:example.org namespace. Following the Linked Data principle, every IRI should be deferenceable. So, for example, accessing to http://example.org/HanpoetryOWL one should download THE ontology. Or, accessing http://example.org/HanpoetryConcept# one should download an RDF file containing, among others, a resource with IRI http://example.org/HanpoetryConcept#CollectionConcept/TangPoetryCollection. To accomplish this principle, the repository should be restructured with avoiding repetition. In addition, substantial IRIs should be used to make the corresponding file accessible. I suggest, after appropriately restructuring the repository, to acquire persistent IRIs using the WebID service (https://www.w3.org/wiki/WebID), in such a way that, when accessed, provide the corresponding files on github.

Another concern is about the prefixes used in the paper. Prefix abbreviations are provided in Section 4, but some prefixes are used before (for example, in hpc:Moon in Section 3.1.2 or bf:Work in Section 3.2.1). Prefixe abbreviations should be declared in advance.

Othere minor suggestions are reported in the following.

========
Q1. IIIF should be briefly introduced here as a "generic" reader probably doesn't know what it is.
Q2. In Section 1.2, on page 2 one can found "Based on an assessment", but for the FEAR principles this assessment should be accessible to the reader so that she could verify the coherence with the data you are reporting. Addind a bibliographic reference to this assessment should be enough.
Q3. on page 2, "A smaller but growing" shouldn't be capitalized.
Q4. Please provide bibliographic references for IIIF, CBDB and JBDB or say in what sections they will be recalled.
Q5. In Section 2.2 on page 3 I'm not sure that "Ontology Intelligence Service Center" is correct w.r.t. the cited work.
Q6. In Section 2.3 on page 3, change the font to characterize as titles and add a full stop after the followings: "Undigitized or Image-Only Resources", "Unstructured or Minimally Structured Text Resources", "Poetry Knowledge Graph Systems", "Experimental Semantic Modeling of Han Poetry Using RDF/OWL".
Q7. In Section 3 on page 5 "RDF instance data (e.g., poems, poets, poetry collections) are constructed and
deployed across multiple triple stores" is reported. However, using "RDF instance data" may be misleading as instances (or individuals) is tipical of semantically "reach" formalisms (such as RDF Schema or OWL), where as RDF provide just a notion of resource. Thus this sentence may be rephrased, or you could use just "instance data" instead of "RDF instance data".
Q8. In Section 3.1 on page 5 the notion of "semantic consistency" has a precise meaning in the field of formal logics and OWL (in particular, consistency check is a reasoning task), thus I suggest to reformulate using different words here.
Q9. In Section 3.1.1 on page 6 "The vocabulary structure of the SKOS-based conceptual vocabulary system" is quite redundant, I suggest to remove the first occurrence of "vocabulary".
Q10. In Section 3.1.1 on page 6, why "Poem title concept"? Usually title is a metadata of a poem. Also, "knowledge" is a very broad notion, probably a more explicative name could be more appropriate.
Q11. In Section 3.1.1 on page 6, replace "struct ured using properties" with "structured using properties".
Q12. Section 3.1.1, where the thesaurus can be downloaded? Provide a persistent IRI.
Q13. In Section 3.1.1 on page 6, concerning the sentence "Each poem is modeled as an RDF instance (hp:Poem)", the same considerations in Q8 hold. In addition, I can't find the hp:Poem class or skos concept in the theasurus. There is some confusion between hp:Poem and Poem title concept.
Q14. In Section 3.1.2 on page 6, an "open data platform" is mentioned for the first time. Inspecting all the paper I just found a github repository
I propose to reformulate " The SKOS Tree Viewer is available on the open data platform" as " The SKOS Tree Viewer is available open-source" or similar.
Q15. In Section 3.1.2 on page 6: "This concept represents loneliness, home-sickness, and purity, symbolizing the philosophical and emotional resonance [...]". This assertion is out of the context of this section and, so, it is a bit confusing. These "represent" relations are not contained in the SKOS files. In my understanding, they will be presented later. Thus I suggest to add a reference to where these representations will be presented.
Q16. In Section 3.2 on page 7: " the Person class reuses the FOAF (Friend of a Friend) model". But the FOAF ontology is not imported by HanpoetryOWL, as well as all the constraints on foaf:Person in it. May be there are some reasons for this fact. In such a case, the motivations should be explained.
Q17. In Section 3.2.1 on page 7: "the Poem class reuses the Work concept from Bibframe". Analogously, the Bibframe ontology is not imported by HanpoetryOWL.
Q18. In Section 3.2.1 on page 7: " the Poem class is defined as a subclass of bf:Work and assigned a unique identifier to store metadata". I can't understand this sentence. Note that a "unique identifier(s) to store metadata" is assigned to Poem instances (individuals). Of course, the Poem class has itself a unique identifier, but does not contain metadata about poems.
Q19. The linking between Knowledge sub-classes, corresponding SKOS concepts (?) and Poem instances is really unclear. I can't deduce it from the provided example materiale. Figure 2 does not help. Probably, this is better explained in Section 4.2, and thus adding a reference to this section may be enough here.
Q20. In Section 3.3.1 on page 9, replace " database, , as illustrated in Figure 4" with " database, as illustrated in Figure 4".
Q21. In Figure 4 on page 9, replace "Collcetion" with "Collection".
Q22. In Section 3.3.1 on page 9, " These TEI files are linked to the corresponding RDF resources". Here how these connections are established should be explained in more details. Anticipating that this will be explained in details in Section 4.4 would be enough.
Q23. In Section 3.3.2 on page 9, "the system uses Virtuoso triple store". Here is unclear what "system" the authors are talking about. The
reader could think that there is an online service with a corresponding SPARQL endpoint, but I understood that the database is just provided as RDF files in the GitHub repository. I suggest to restructure this section in order to clarify that there is no publicly available "system" actually running.
Q24. In Section 4 on page 10. In my understanding, concept:moon should abbreviate http://example.org/HanpoetryConcept#KnowledgeConcept/NaturalScene/Celest.... But in the prefix declarations provided in the same section, concept is the abbreviation for http://example.org/HanpoetryConcept#. Probably this "simplified notation" should be explained in more details.
Q25. In Section 4.1 on page 11, "Figure 7 illustrates this modeling structure using Shu Dao Nan" but there is no Figure 7 in the paper. Authors probably are referring to the code snippet just before. Using the figure environment also for code snippets may be a good choice.
Q26. In Section 4.2 on page 12, the property hp:hasEmotion is mentioned. May be that Emotion would deserve its own class.
Q27. In Section 4.5 on page 14, "key entities such as hp:Poem,hp:Person, and hp:Collection" should be rephrased as Poem, Person and Collection are the classes, whereas here I suppose the authors are talking about the individuals belonging to these classes.
Q28. In Section 5.2.1 on page 15, the results of a SPARQL query are reported, but the query has to be reported as well.
Q29. In Section 5.2.1 on page 16, replace "capabilities:First, " with "capabilities. First,".
Q30. In Section 5.2.2 on page 16, similarly to Q29, the query has not been reported.

Review #3
Anonymous submitted on 05/Feb/2026
Suggestion:
Minor Revision
Review Comment:

Overall:(1) originality -- very good. The long-term Stable Link to Resources:
https://github.com/TASYU78/Hanpoetry demonstrated the components and the unique expressions with SKOS, OWL, and available in RDF. (2) significance of the results -- since its focused resources and the approaches are unique and used high level semantic technologies, the results are impressive and significant. (3) quality of writing. Good in general, but some formats in references need to be checked. [See below.]

Suggestions:
1. While using Li Bai as the example in the article, normal English readers might not understand the last name and first name's treatment here or the expression and which one is foaf:surname. You should bring the ISO 7098:2015 Information and documentation — Romanization of Chinese (published in 1982 and last revised in 2015 https://www.iso.org/standard/61420.html. This International Standard can be applied in documentation of bibliographies, catalogues, indices, toponymic lists, etc. More recently, it was adopted and released as a joint Australian/New Zealand Standard, AS/NZS ISO 7098:2025. https://www.standards.govt.nz/shop/asnzs-iso-70982025

2.Good to noticed the Getty AAT's Linked Data actions. I suggest you check its SPARQL endpoints which provided many search templates, bringing more effective usages across languages, regions, and cultures. https://vocab.getty.edu/queries#Finding_Subjects .
This will be helpful for your enhancement of the products to across the usages in vernacular languages—such as Japanese kanbun kundoku, Korean idu notations, and Vietnamese script you mentioned.

3. Related to these Chinese Character Circle, if you can also provide any example when you mention these, in addition to Japanese (e.g., for that "Moon"), it will be very helpful for this paper's explanation about the coverage.

4. Editorial corrections needed: Your quotation marks are not in correct formats. E.g., you have ”Moon”.

5. Some references formats should be double checked, especially the upper-letter usages. The following are some examples.
5a. For the unique thesaurus AAT, you have :Art architecture thesaurus(aat).
It should be: Art & Architecture Thesaurus (AAT).
5b. For the article, the first letter of the title has some issues: E. Hyvönen. “sampo” model and semantic portals. It should be "Sampo".
5c. The lower cases of proper names should be reconsidered, e.g., The
jazz ontology; The cidoc conceptual reference model (crm), etc.

Review #4
Anonymous submitted on 26/Feb/2026
Suggestion:
Minor Revision
Review Comment:

The paper presents a well-structured three-layer architecture that consists of a Resource Layer, Ontology Layer, and Concept Layer, combining with IIIF, OWL, and SKOS, to address the interoperability and semantic linkage of classical Chinese poetry digital resources. The technical framework is sound; using URIs as Semantic Anchors to bridge textual manifestations with multilingual labels is a highly effective approach that aligns with the current standards of the Semantic Web and Linked Open Data (LOD).
A primary strength of this work lies in its originality regarding regional scope. Traditionally, the digital modeling of classical Chinese poetry has been highly localized and fragmented by modern national borders. This paper breaks these constraints by establishing a cross-regional semantic model. By shifting the focus from isolated datasets to a unified framework, the authors provide an innovative path toward the interoperability of classical poetry across different geographic and cultural domains.
Given that Classical Chinese served as the dominant literary language of pre-modern East Asia, this framework possesses significant research value. It not only aids in the preservation of cultural heritage but also provides a necessary infrastructure for the advancement of comparative literature and transcultural studies in the Digital Humanities.
The most significant contribution lies in Innovatively introduced the knowledge Layer. The original "Knowledge Layer" provides a mapping pathway from vocabulary to literary entities (e.g., "moon," "melancholy"), enabling the representation of literary imagery and themes. This aspect demonstrates a valuable bridging function between raw data and conceptual understanding. This elevates Classical Chinese Poetry from simple "scanned images" or "text documents" to a "semantic knowledge base," enabling more sophisticated analysis and discovery.
However, the model requires enhancements to address the inherent complexities, spatial-temporal dynamics, and cross-cultural features of East Asian classical literature.
1. The paper currently uses FOAF (Friend of a Friend) in the Ontology Layer to describe authors. The FOAF model is overly simplistic and inadequate for representing the complex identity systems of classical Chinese literati (including given names, courtesy names, pen names, posthumous titles, official positions, and family lineages). It is recommended to extend the Ontology Layer with custom classes to establish the logical connections between social status and creative motivation. This is crucial for understanding the context behind poetry.
2. The current ontology lacks a mechanism or algorithmic design for the automated synchronization of divergent regnal year systems within the Sinosphere, for example, aligning the Japanese Tenpyō era with the Tang Dynasty’s Kaiyuan era. Without this, the "Time" dimension remains a static string rather than a relational historical coordinate.
3. The model does not offer specialized ontological strategy for geographic evolution—specifically the phenomena of "same name, different location" (Tongming Yidi) or "one location, multiple names" (Yidi Duoming), such as the shifting administrative boundaries of Jiangxia across different dynasties. There is a lack of concrete integration with historical gazetteers or spatiotemporal GIS modeling.
4. This is the most significant omission in the Knowledge Layer. The model ignores Semantic Drift. the subtle shifts in meaning as imagery travels across Chinese, Japanese, Korean, and Vietnamese contexts. It assumes a single URI (e.g., for "Plum Blossom") points to a universal concept across all languages. Consider the imagery of the "Chrysanthemum" (菊花). In the Chinese context, it primarily symbolizes reclusion (隐逸) and moral integrity (Zhi Ren Lun Shi), inspired by Tao Yuanming.
In the Japanese context, while sharing these roots, it evolved into a symbol of Imperial Authority (皇权) and the aesthetic of Mono-no-aware (物哀). So a shared URI for "Chrysanthemum" in the Knowledge Layer is insufficient if it cannot handle these divergent cultural metaphors. The authors should implement a Context-Aware Semantic Model within the Knowledge Layer that maps the same URI to different conceptual nodes based on the origin of the poem (e.g., Tang Dynasty vs. Heian Period). in other cultural contexts.
Suggestion: Accept with Minor Revision.
This paper excels in its technical execution. The authors are encouraged to expand the discussion section to include prospects for in-depth spatiotemporal modeling and Sinosphere-wide extensions. The goal should be to refine the Knowledge Layer, so it serves as a true bridge between raw data and a deep understanding of the cultural and historical significance of poetry. This "bridging role" constitutes the most significant academic contribution for future iterations of this work.