Revealing Medieval Manuscript Treasures: Semantic Web Integration through a Polymorphic Knowledge Graph and Linked Open Data

Tracking #: 3910-5124

Authors: 
Faria Ferooz
Monica Palmirani

Responsible editor: 
Guest Editors 2025 OD+CH

Submission type: 
Full Paper
Abstract: 
Medieval manuscripts represent rich and heterogeneous cultural heritage resources, offering interdisciplinary insights into historical context, textual content, physical features, and artistic elements. However, integrating such diverse data remains challenging due to inconsistencies in metadata schemas and variations in data quality. This article addresses these challenges through a semantic web-based approach applied to two distinctive medieval manuscript collections: Progetto Irnerio and Mosaico. It presents an extension of the existing Medieval Manuscript Data Integration Ontology (MMDIO), originally developed using the MeLOn methodology and evaluated with the FOCA framework. This extended ontology builds upon the previously published MeMO ontology, introducing new classes and relationships designed specifically for the integration of medieval manuscript data. A key contribution is the construction of a Polymorphic Knowledge Graph that semantically integrates heterogeneous datasets from both collections, enabling faceted search, semantic browsing, and advanced visualization. Additionally, the MELIORATE Linked Open Data (LOD) platform is developed to provide unified, interoperable online access to manuscript content, significantly enhancing data accessibility and supporting interdisciplinary collaboration. This integrated approach demonstrates the potential of semantic web technologies to bridge disciplinary gaps among Digital Humanities, Legal Studies, and Computer Science, offering new methodological opportunities for cultural heritage research and digital preservation.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Cristiano Longo submitted on 16/Oct/2025
Suggestion:
Major Revision
Review Comment:

In my understanding, the paper "Revealing Medieval Manuscript Treasures: Semantic Web Integration through a Polymorphic Knowledge Graph and Linked Open Data" presents an effort starting a previous work (I suppose the one reported in bibliography as [19]) describing the RDF conversion of the data underlying the Irnerio platform and the Authenticum collection (which is part of the Mosaico portal) using the MMDIO vocabulary, to extend the resulting RDF dataset (or datasets?) with the other collections provided by the Mosaico portal. To this end, also MMDIO has to be extended. Finally, the portal Meliorate, providing an unified access to the data, is described.

Originality of the work is not very high, as most of the methodologies used in the previous work have been reused here. However, the final result is relevant and worthwhile. The English is very fluent. However, in my opinion, there are several issues that prevent this work to be published in SWJ. In what follows my comments are reported from C1 to C67, interleaved with free text.

Firts, there are severe typhographical issues.
C1. Section 3.1.1 pag.11 "Several collections refer to specific folios within the manuscript," overflows page borders.
C2. pag.13 Figure 3 overflows page borders, probably several parts may be removed just saving those representing properly extensions.
C3. pag.14 Table 1 goes out of page borders.
C4. pag.19 Figure 4 overflows page borders. Probably removing the arcs and nodes that are not stricly related with the example intent would enhance the example readability.
C5. Figures 5, 6 and 7 overflows page borders. Probably tables would be more appropriate.
C6. In pag.22 and pag.25 and pag.30 and pag.34 line spacing seems different from other pages. Other font characteristics should be checked.
C7. Figures 8, 9, 11, 13, 14, 21, 22, 23, 24, 25, 26, 27, 29, 30 overflow page limits.
C8. pag.31 In Listing 10 there are unuseful blank spaces at the end.

For enhancing the verifiability and reproducibility, as requested by the FAIR principles, all the tools and technologies should be made available to the reader.
C9. All the namespace abbreviations used as prefixes through the paper should be reported: d2me, owl, fabio, doco, memo, mmdio, foaf, biro, cito, core, dcterms, c4o, tvc, arco, frbf.
C10. A bibliographic reference for each of the core semantic web technologies used should be reported: RDF, SPARQL, OWL, LOD and LOD Principles.
C11. pag.14 Section 3.14 "Sample data from the Mosaico collections was extracted and mapped into RDF using the expanded ontology. The datasets were uploaded to GraphDB and tested throug SPARQL queries"
One of the FAIR principles is reproducibility. As a consequence, one must be able to reproduce the mentioned tests. In practice, providing just one example of these sample data, how they have been converted and SPARQL and query results used to test the convertion would be sufficient. Otherwise, it could be clarified that these tests and sparql queries will be shown in the remainder of the paper if considered appropriate by the authors. In this case, it is mandatory that the original data from Mosaico are publicly available.
C12. pag.17 Section 4.5 "Custom Python scripts were used to extract and assign unique identifiers" I can't find these scripts, not even in the repository.
C13. pag.17 Section 4.5 "All cleaned and enriched metadata were first loaded into structured pandas DataFrames from Excel sources." Excel sources should be available to the reader.
C14. pag.18 Section 4.6 "A series of SPARQL queries was implemented to validate the knowledge graph’s accuracy and flexibility" where one can find such queries? Are the authors referring to queries in following paragraphs? If so, this should be clarified. Otherwise, such queries have to be made available to the reader.
C15. pag.22 Section 4.12 I can't find nov.6.pr in the datasets provided in the github repository. In addition, performing the query in Listing 4 against the Meliorate SPARQL endpoint an "Undefined array key 0" error is returned.

My main concern is about the fact the contribution of the effort described in this paper with respect to the work done in the "previous work" is not very evident. It should be clean from the beginning and through the paper. In addition, means to compare the previous works (ontology and data) with the current one have to be provided.

C16. pag.1 As reported in the abstract, one of the contributions of this work is the construction of a Polymorphic Knowledge Graph. It is unclear if "Polymorphic Knowledge Graph" is a novel notion introduced in this field, in this case it should be defined in full details, or this is a widely recognized notion of Linked Open Data, if so a bibliographic reference is needed.
C17. Section 2 pages 4 to 9
Works recalled in Sections 2.9, 2.10, 2.11 differ from those recalled in the previous sections in that they represents the "starting point" of the current work.
In addition, title of Section 2.12 may be misleading as the word "Current" could suggest to the reader that the work is still in progress. Placing sections from 2.10 to 2.12
in a novel section "Preliminaries" or similar, and choosing a different name for 2.12 would help the reader to clearly identify the "starting point" of the current work.
C18. pag.1 Section 1 "Building on prior work ..." Is it the one reported in the bibliographic reference [19]? If so, use [19] instead of "prior work", or just cite [19].
C19. pag.9 Section 2.12 " the current study extends the Medieval Manuscript Data Integration Ontology (MMDIO)" Again, use [19] instead of "previous work" (if correct). The two versions of MMDIO should be
both available to the reader, and the changes between these two versions should be reported clearly. It may suffices to anticipate here that the extension will be presented in Section 3.
C20. pag.9 Section 2.12 "The core MMDIO structure, introduced in previous work" Again previous work.
C21. pag.10 Section 3 "The methodological framework for this study builds on previous work" use [19] instead of "previous work"
C22. pag.10 Section 3 "The previous work focused on harmonizing metadata related to the Irnerio platform and the Authenticum collection from the Mosaico portal." Again previous work.
In addition, the harmonizing process should be briefly expressed here: it was based on MMDIO?
C23. pag.12 Section 3.3 starts with Use Case 4 (Section 3.3.1) and this is confusing. Where the previous three use cases could be found should be clarified.
C24. Section 3 describes the extension of MMDIO. Where can I get the old and the new ontology versions? In addition, actually I wasn't able to find the new property hasSignature in MMDIO (https://drive.google.com/uc?export=download&id=1FD5vkcYS6ipn-hSf5VKdFIrd...). A summary table enumerating the new classes and property may be helpful.
C25. pag.15 Section 4 "Building on the foundational MMDIO ontology and earlier work" cite the earlier work ([19]?).
C26. pag.15 Section 4 describes this specific Polymorphic Knowledge Graph. I suppose it is the RDF dataset behind Meliorate portal. It is unclear to me whether the authors have extended an existing graph (i.e. the Mosaico one) or built a new one. If the former, the source graph must be explicitly stated and made availabe to the reader. If the latter, the building process must be reported, may be referencing to other sections if appropriate.
C27. pag.18 Section 4.7 "each use case previously defined" Do "previously" stands for in the previous work or in Section 4.6?
C28. pag.18 Section 4.7 "The Polymorphic Knowledge Graph developed in this phase transforms the earlier RDF-based pilot work ..." Is this earlier pilot a product of the previous work? Can it be accessed?
C29. pag.38 Section 5.2 "Each book or collection in the Mosaico portal is semantically described and interlinked using RDF." Are the authors refering to thethe polymorphic knowledge graph described in Section 4? This should be cleared up.

Finally, some random suggestions are reported.

C30. pag.1 Section 1 "Building on prior work that developed the Medieval Manuscript Data Integration Ontology (MMDIO)" MMDIO will be reviewd in Section 2.1. Recalling this section here would be helpful for readers.
C31. pag.3 Section 1 "The goal is to support flexible SPARQL querying". Here I can't understand the intended meaning of "flexible".
C32. pag.4 Section 2.1 I can't find dm2e:hasAnnotableContents8 in the dm2e ontology (https://github.com/DM2E/dm2e-ontologies/blob/master/src/main/resources/d...)
C33. pag.4 Section 2.2 I can't understand "authority-linked open knowledge vocabulary". May be that using "authoritative" in place of "authority-linked" would be more appropriate.
C34. pag.5 Section 2.3 replace "entities were linked to external authority files" with "entities were linked to external authoritative datasets" or similar.
C35. pag.5 Section 2.4 CIDOC CRM was first mentioned in section 2.2, so it would be appropriate to move section 2.4 before section 2.2.
C36. pag.7 Sections 2.7 and 2.8 Replace "This case study" with "This study"
C37. pag.7 Section 2.8 Replace "CIDOC-CRM" with "CIDOC CRM" (two times)
C38. pag.10 Section 3.1 "Mosaico, while rich in content, lacks a formalized ontology and presents information inconsistently" The intended meanning of "inconsistently" here should be clarified, may be with some examples.
C39. pag.11 Section 3.1.1 "The present study builds upon the foundational MMDIO ontology" For ontologies, foundational has a precise meaning, as it describe ontologies providing very general terms that are common across all domains. In my understanding, MMDIO is not foundational (in this sense), but it is domain specific as it concerns medieval manuscripts.
C40. pag.12 Section 3.1.1 "extension of MMDIO for additional manuscript collections and more complex use cases [27]." referencing to [27] sounds inappropriate here.
C41. pag.12 Section 3.3.1 "to link fabio:Book to a foaf:Person" -> "to link a fabio:Book to a foaf:Person"
C42. pag.12 Section 3.3.1 "the class mmdio:SubSection was added, and related to doco:Section via the
existing doco:contains property" should be reformulated as individuals belonging to these classes are related, not the classes themselves.
C43. pag.12 Section 3.3.1 "mmdio:referstoManifestation" -> "mmdio:refersToManifestation"
C44. pag.12 Figure 2 ending arrows are missing in mmdio:refersToManifestation links.
C45. pag.17 Section 4.5 "subsections referred to specific folios. Custom Python scripts " -> "subsections referred to specific folios, custom Python scripts "
C46. pag.17 Section 4.5 "Descriptive metadata, including script style, initials, rubrication, and number of writing hands from the codex records. These elements were categorized" ->
"Descriptive metadata, including script style, initials, rubrication, and number of writing hands from the codex records were categorized"
C47. pag.18 Section 4.6 "These queries were tested:" this sentence needs to be reformulated as what follows are not queries, may be query types.
C48. pag.23 in Figure 8 probably the central "Gloss" should be "nov.6.pr".
C49. pag.22 Section 4.12 "The gloss information from the ‘London, Ogden 5’ manuscript collection, particu-
larly identified in ‘Authenticum’ manuscripts, is presented." It is unclear where this information is presented? Is it returned by the query in Listing 4?
C50. pag.23 Section 4.13 Provenance has a precise meaning in knowledge representation, i.e., object origins. Here it would be appropriate to be more specific and using, for example, "Geographical Provenance".
C51. pag.24 Section 4.13 The following two paragraphs says the same things, so one of them should be eliminated.
"The query retrieved data on the various locations where the manuscript has been housed over time, offering valuable insights into its custodial history. This information is crucial for examining the manuscript’s provenance and understanding the broader patterns of manuscript distribution."
"The query returned data on the manuscript’s past locations, providing valuable insight into its custodial history. This helps trace its provenance and highlights on broader patterns of manuscript circulation."
C52. pag.24 Section 4.13 The following paragraph should be removed
"The manuscript’s journey from its last to its current location offering insight into its historical journey and transitions"
C53. pag.27 Section 4.16 "The output (Figure 13)" but Figure 13 does not present the output of the query in Listing 8. Instead, it illustrates the logical structure of the Montecassino book (see pag.30)
C54. pag.26 Section 4.14 Figure 11 reports data about de-donatione (I can't find it in the online resource), but Listing 6 was about Tres-libri-cum-glossa-Accursii. This may be confusing for readers.
C55. pag.28 Section 4.16 "Figure 12 visually illustrates the interconnections among various platforms, mapping how specific books and manuscripts are linked. This graphical representation helps to clarify the complexity of these relationships". I can't see any complexity, as Figure 13 shows just two nodes and a single edge.
C56. pag.30 Section 4.16 "Olomouc Collection This query, as shown in Listing 10" but Olomuc Collection is not a query. It has to be reformulated, for example as "Olomouc Collection The query shown in Listing 10".
C57. pag.30 Listing 10 In the WHERE clause of the query in Figure 10 the following pattern is not relevant and should be omitted
?Book a fabio:Book; dcterms:title "Olomuc".
C58. pag.33 Section 4.16 "This structured data helps scholars understand" -> "This structured data helps scholars to understand"
C59. pag.34 Section 4.16 The query in Listing 12 has syntax errors.
C60. pag.35 Section 4.16 "It also identifies the individual manuscripts within each manuscript collection using the mmdio:ManuscriptCollection property" -> "It also identifies the individual manuscripts within each manuscript collection using the frbr:part property"
C61. pag.37 Section 5.1 "MELIORATE offers a user-centric interface" The intended meaning of "user-centric" is not clear.
C62. pag.38 Section 5.3 "one of the most prominent digital collections available on the Mosacio Platform" Here Mosacio Platform is clearly a typo. I suggest to replace it with "Meliorate Platform".
C63. pag.38 Section 5.3 The following two paragraphs are redundant. One of them should be removed.
"Figure 21 shows the manuscript collection and sections linked to the digital Authenticum. The manuscript collection is entirely navigable, allowing the user to view information about the manuscripts held in the collections, the periods covered by the collections, where they are held, bibliographic references, further descriptions, and embodiment in a specific codex."
"Figure 21 presents the manuscript collections and sections associated with the digital version of Authenticum. Each manuscript collection is fully navigable, allowing users to view detailed information such as the manuscripts included in the collection, the period they cover, their location, bibliographic references, supplementary descriptions, and how they are embodied in a specific codex."
C64. pag.40 Section 5.3 "The identifiers provide direct links to the Mosacio portal" -> "The identifiers provide direct links to the Mosaico portal"
C65. pag.44 Figure 26 is the same as Figure 27
C66. pag.43 Section 5.3 "Figure 28 demonstrates how the Irnerio platform interfaces with the broader Mosaico system" May be tha "Mosaico" should be changed to "Meliorate"?
C67. pag.46 "The MELIORATE" -> "MELIORATE"

Review #2
Anonymous submitted on 12/May/2026
Suggestion:
Minor Revision
Review Comment:

The paper extends the MMDIO ontology to integrate heterogeneous metadata from two medieval manuscript collections (Progetto Irnerio and Mosaico). It introduces new classes and properties to capture authorship, internal document structure, and physical-logical relationships. The authors construct a Polymorphic Knowledge Graph that allows the same entity to take on different roles across temporal and spatial contexts. The MELIORATE LOD platform provides public, SPARQL-queryable access to the integrated data. The GitHub repository (https://github.com/irnerio-mosaico-opendata/mmdio) contains the ontology, RDF dump, and queries, which greatly supports reproducibility. While I identify several areas for clarification and minor improvement, the work is substantial and its core claims are well-supported by the evidence. I therefore recommend Minor Revision.

Page 11, “The ontology extension was guided by the MeLON methodology… Evaluation criteria from the original phase, such as completeness, coherence, and usability, were reused…” – The MeLON methodology is firstly cited without a reference. The paper describes MeLON steps at a high level (“goal definition, use case formulation, evaluation indicators, state-of-the-art analysis, ontology modeling, and iterative testing”), this seems to me the standard ontology development steps. For example, how the MeLON methodology fits this ontology development scenario compared to other popular ontology development methods or best practice? Without this traceability and formal comparison, the claim of the choice of making extension follows MeLON remains abstract.

Page 14-15, Table 1 (FOCA Evaluation Results) – The percentage scores (e.g., 100%, 87.5%) are presented without explaining how they were derived from the goal-question metrics. Also, the beta regression coefficients (-0.44, 0.03, 0.02, etc.) on page 15 are stated but not well explained. Are these coefficients taken directly from Bandeira et al. (2016)?

Page 15, “The final FOCA quality score for MMDIO is: 0.962” – The evaluation claims excellent quality, but it focuses solely on the ontology’s formal structure, also lack of benchmarks to show how this score compared with other existing comparable ontologies. No validation is provided for whether the Polymorphic Knowledge Graph or the RDF transformation correctly reflects the original manuscript data (e.g., completeness of mapping, absence of data loss). There are also other important evaluation methods like OntoMetrics for structural metrics (e.g., class richness, inheritance depth), FOOPS!for FAIR compliance.

Page 16, “In a PKG, each instance… is allowed to exist in multiple contexts with different relationships. This is particularly useful for medieval manuscripts like Authenticum…” – The term “Polymorphic Knowledge Graph” need a more formal definition. From the example (Authenticum linked to different collections with atTime/hasLocation), this appears to be a standard use of reified temporal and spatial qualifiers. Please better clarify how this differs from named graphs or nary relations.

Page 22-23, Listing 4 and Figure 8 (gloss annotations) – The query correctly retrieves gloss types and placements. However, the paper does not explain how glosses are originally encoded in the source platforms (TEI? relational tables?). Adding a short description of the gloss extraction and mapping process would help readers assess the fidelity of the RDF representation.

Page 37, Section 5.7 “Evaluation Against LOD Principles” – The authors mention “A preliminary usability study involving legal historians and medieval scholars indicated strong appreciation…” but provide no details (number of participants, tasks, metrics, or whether the study was qualitative/quantitative).

Page 47, “Manual metadata extraction remains time-consuming and limits scalability. … reliance on expert modeling makes full automation difficult.” – The Conclusion lists challenges and acknowledges the difficulty of automation. Later it offers some only vague future directions (“LLMs”, “automated pipelines”). This part need more enhancement.

General observation on evaluation – The paper uses three complementary evaluation methods (FOCA for ontology quality, SPARQL queries for competency questions, and a LOD principle checklist). This is a solid approach. To further strengthen the evaluation, please add basic statistics about the knowledge graph (e.g., total number of triples, number of manuscripts, folios, glosses) and provide approximate query response times for the shown queries. This is not mandatory but would improve completeness.

Review #3
By Peter A. Stokes submitted on 19/May/2026
Suggestion:
Major Revision
Review Comment:

This is an interesting contribution to a field that has received relatively little attention given the importance of the subject. It is therefore timely and potentially of interest not only to scholars in both manuscript studies, digital humanities and the semantic web, but also librarians, archivists and others. It shows some originality insofar as it presents some additions to an existing ontology and then tests the result for integration of two datasets including an evaluation of the new ontology and presentation of a new web interface.

The article mostly clear and shows few typos or problems of expression. The new ontology is avilable on GitHub and the data artifacts there are complete, comprising documentation in HTML, OWL and RDF files, and sample SPARQL queries. The literature review of other comporable ontologies is relatively thorough, and the major relevant projects seem to be represented.

Nevertheless, there are some relatively substantial issues in the structure and presentation which should be improved to strengthen the argument, particularly by making the discussion more critical.

The most important issue is that the goals of the new ontology are not clearly presented, and without these the limitations of the previous projects and the strengths of the new ontology cannot be evaluated. All practical ontologies are selective by design – a truly complete ontology would be unusable – and so incompleteness in itself is not a criticism. For this reason, a clearer evaluation is needed of which needs are not met by the existing ontologies, as is a more critical evaluation of how the new ontology response to these needs. Similarly referring to the 'extensibility of MMDIO as a robust, interoperable semantic model designed to meet evolving research demands' seems overstated, insofar as it is not at all clear how MMDIO is any more exstensible or robust than the other ontologies. Even the assertion that the new ontology is 'a more subtle and comprehensive semantic framework capable of modeling the various manuscript structures [and] facilitating scholars’ inquiry' is not really supported since there is no clear definition of what structures or what scholarly inquires are considered.

In terms of structure and writing, it would also be a significant help to the reader if some of the key terms from the existing ontologies were defined in the article, particularly fabio:Book, memo:Manuscript and memo:Codex which are not necessarily intuitive. There seems also to be some slippage in the useage of terms: for instance, a memo:Manuscript is defined in the ontology as a text plus gloss and operating at the level of Expression, but it also is defined in the article and schema diagram (although not in the HTML documentation) as having a location, being associated with (apparently) only one manifestation and therefore having one or more folios, and it is not at all clear to this reader how an Expression can have a location. Indeed, the definition of memo:Codex as 'A book constructed of one or more folios and at most one binding’ adds to the confusion, since if a Codex is defined as a type of Book then how can the former be a Manifestation and the latter an Expression? Similarly, it is very unclear what 'the physical version' means when referring to an Expression (§4.17).

The description of the FOCA evaluation is not helpful to this reader, especially as no details were given beyond summarising the method (how many evaluators? what variation was there in grading? why was FOCA chosen? and so on). Given the length of the article already, it would probably more helpful simply to cite the publication and give the final FOCA score, rather than repeating the general points of the method without proving the specifics. It is also unclear why the first 2016 version of the FOCA preprint is cited and not the 2017 revised final version.

In addition to these conceptual points, there are also some minor details that could be addressed:

§2.4: The relevance of the COURAGE project is not at all clear (why not an example closer to the subject, since there are many?). I suggest simply deleting this example.

§2.5: The reference to Zhitomirsky-Geffet and Prebor should come at the start of the paragraph, otherwise it is unclear which project is being discussed.

§3.3.1: 'Use case 4': What is this (what are Use Cases 1-3)? This label presumably it comes from internal project management and is not relevant here.

§3.2 and 3.4: The relevance of UML is unclear, since this is not used anywhere in the article that I can see.

Fig. 28: The browser has the wrong character encoding and so the screenshot must be fixed.

Finally, there are numerous positive terms applied to the MMDIO ontology that do not have any real meaning or are not supported by any evidence and should probably be deleted as they reinforce the impression that the authors are not critical of their own work. Examples are 'evolvable' (§2.11; the meaning of this word is extremely unclear), 'smooth user experience' (according to whom? by which criteria?), 'robust' (§5.5: again, by which measure? what have you done to test robustness? how is it more robust than any other comparable system?), 'enhances the accessibility and usability' (§5.4: how do you know? by what measure?), 'accurately captures' (§5.4), 'all relevant information' (§5.4: relevant to whom? how do you demonstrate 'all' relevant information?), 'entirely navigable' (§5.3: isn't this normal? what is 'partially navigable'?), 'particularly helpful to scholars' (§5.17: based on what evidence? which scholars?). Indeed, the screenshots in Fig. 26 and 27 look extremely user-unfriendly to this reader, and I have difficulty imagining colleagues in manuscript studies accepting a portal that shows all data as raw IRIs, so the assertion that this is a helpful enhancement to accessibility and usability needs more support.