Review Comment:
In my understanding, the paper "Revealing Medieval Manuscript Treasures: Semantic Web Integration through a Polymorphic Knowledge Graph and Linked Open Data" presents an effort starting a previous work (I suppose the one reported in bibliography as [19]) describing the RDF conversion of the data underlying the Irnerio platform and the Authenticum collection (which is part of the Mosaico portal) using the MMDIO vocabulary, to extend the resulting RDF dataset (or datasets?) with the other collections provided by the Mosaico portal. To this end, also MMDIO has to be extended. Finally, the portal Meliorate, providing an unified access to the data, is described.
Originality of the work is not very high, as most of the methodologies used in the previous work have been reused here. However, the final result is relevant and worthwhile. The English is very fluent. However, in my opinion, there are several issues that prevent this work to be published in SWJ. In what follows my comments are reported from C1 to C67, interleaved with free text.
Firts, there are severe typhographical issues.
C1. Section 3.1.1 pag.11 "Several collections refer to specific folios within the manuscript," overflows page borders.
C2. pag.13 Figure 3 overflows page borders, probably several parts may be removed just saving those representing properly extensions.
C3. pag.14 Table 1 goes out of page borders.
C4. pag.19 Figure 4 overflows page borders. Probably removing the arcs and nodes that are not stricly related with the example intent would enhance the example readability.
C5. Figures 5, 6 and 7 overflows page borders. Probably tables would be more appropriate.
C6. In pag.22 and pag.25 and pag.30 and pag.34 line spacing seems different from other pages. Other font characteristics should be checked.
C7. Figures 8, 9, 11, 13, 14, 21, 22, 23, 24, 25, 26, 27, 29, 30 overflow page limits.
C8. pag.31 In Listing 10 there are unuseful blank spaces at the end.
For enhancing the verifiability and reproducibility, as requested by the FAIR principles, all the tools and technologies should be made available to the reader.
C9. All the namespace abbreviations used as prefixes through the paper should be reported: d2me, owl, fabio, doco, memo, mmdio, foaf, biro, cito, core, dcterms, c4o, tvc, arco, frbf.
C10. A bibliographic reference for each of the core semantic web technologies used should be reported: RDF, SPARQL, OWL, LOD and LOD Principles.
C11. pag.14 Section 3.14 "Sample data from the Mosaico collections was extracted and mapped into RDF using the expanded ontology. The datasets were uploaded to GraphDB and tested throug SPARQL queries"
One of the FAIR principles is reproducibility. As a consequence, one must be able to reproduce the mentioned tests. In practice, providing just one example of these sample data, how they have been converted and SPARQL and query results used to test the convertion would be sufficient. Otherwise, it could be clarified that these tests and sparql queries will be shown in the remainder of the paper if considered appropriate by the authors. In this case, it is mandatory that the original data from Mosaico are publicly available.
C12. pag.17 Section 4.5 "Custom Python scripts were used to extract and assign unique identifiers" I can't find these scripts, not even in the repository.
C13. pag.17 Section 4.5 "All cleaned and enriched metadata were first loaded into structured pandas DataFrames from Excel sources." Excel sources should be available to the reader.
C14. pag.18 Section 4.6 "A series of SPARQL queries was implemented to validate the knowledge graph’s accuracy and flexibility" where one can find such queries? Are the authors referring to queries in following paragraphs? If so, this should be clarified. Otherwise, such queries have to be made available to the reader.
C15. pag.22 Section 4.12 I can't find nov.6.pr in the datasets provided in the github repository. In addition, performing the query in Listing 4 against the Meliorate SPARQL endpoint an "Undefined array key 0" error is returned.
My main concern is about the fact the contribution of the effort described in this paper with respect to the work done in the "previous work" is not very evident. It should be clean from the beginning and through the paper. In addition, means to compare the previous works (ontology and data) with the current one have to be provided.
C16. pag.1 As reported in the abstract, one of the contributions of this work is the construction of a Polymorphic Knowledge Graph. It is unclear if "Polymorphic Knowledge Graph" is a novel notion introduced in this field, in this case it should be defined in full details, or this is a widely recognized notion of Linked Open Data, if so a bibliographic reference is needed.
C17. Section 2 pages 4 to 9
Works recalled in Sections 2.9, 2.10, 2.11 differ from those recalled in the previous sections in that they represents the "starting point" of the current work.
In addition, title of Section 2.12 may be misleading as the word "Current" could suggest to the reader that the work is still in progress. Placing sections from 2.10 to 2.12
in a novel section "Preliminaries" or similar, and choosing a different name for 2.12 would help the reader to clearly identify the "starting point" of the current work.
C18. pag.1 Section 1 "Building on prior work ..." Is it the one reported in the bibliographic reference [19]? If so, use [19] instead of "prior work", or just cite [19].
C19. pag.9 Section 2.12 " the current study extends the Medieval Manuscript Data Integration Ontology (MMDIO)" Again, use [19] instead of "previous work" (if correct). The two versions of MMDIO should be
both available to the reader, and the changes between these two versions should be reported clearly. It may suffices to anticipate here that the extension will be presented in Section 3.
C20. pag.9 Section 2.12 "The core MMDIO structure, introduced in previous work" Again previous work.
C21. pag.10 Section 3 "The methodological framework for this study builds on previous work" use [19] instead of "previous work"
C22. pag.10 Section 3 "The previous work focused on harmonizing metadata related to the Irnerio platform and the Authenticum collection from the Mosaico portal." Again previous work.
In addition, the harmonizing process should be briefly expressed here: it was based on MMDIO?
C23. pag.12 Section 3.3 starts with Use Case 4 (Section 3.3.1) and this is confusing. Where the previous three use cases could be found should be clarified.
C24. Section 3 describes the extension of MMDIO. Where can I get the old and the new ontology versions? In addition, actually I wasn't able to find the new property hasSignature in MMDIO (https://drive.google.com/uc?export=download&id=1FD5vkcYS6ipn-hSf5VKdFIrd...). A summary table enumerating the new classes and property may be helpful.
C25. pag.15 Section 4 "Building on the foundational MMDIO ontology and earlier work" cite the earlier work ([19]?).
C26. pag.15 Section 4 describes this specific Polymorphic Knowledge Graph. I suppose it is the RDF dataset behind Meliorate portal. It is unclear to me whether the authors have extended an existing graph (i.e. the Mosaico one) or built a new one. If the former, the source graph must be explicitly stated and made availabe to the reader. If the latter, the building process must be reported, may be referencing to other sections if appropriate.
C27. pag.18 Section 4.7 "each use case previously defined" Do "previously" stands for in the previous work or in Section 4.6?
C28. pag.18 Section 4.7 "The Polymorphic Knowledge Graph developed in this phase transforms the earlier RDF-based pilot work ..." Is this earlier pilot a product of the previous work? Can it be accessed?
C29. pag.38 Section 5.2 "Each book or collection in the Mosaico portal is semantically described and interlinked using RDF." Are the authors refering to thethe polymorphic knowledge graph described in Section 4? This should be cleared up.
Finally, some random suggestions are reported.
C30. pag.1 Section 1 "Building on prior work that developed the Medieval Manuscript Data Integration Ontology (MMDIO)" MMDIO will be reviewd in Section 2.1. Recalling this section here would be helpful for readers.
C31. pag.3 Section 1 "The goal is to support flexible SPARQL querying". Here I can't understand the intended meaning of "flexible".
C32. pag.4 Section 2.1 I can't find dm2e:hasAnnotableContents8 in the dm2e ontology (https://github.com/DM2E/dm2e-ontologies/blob/master/src/main/resources/d...)
C33. pag.4 Section 2.2 I can't understand "authority-linked open knowledge vocabulary". May be that using "authoritative" in place of "authority-linked" would be more appropriate.
C34. pag.5 Section 2.3 replace "entities were linked to external authority files" with "entities were linked to external authoritative datasets" or similar.
C35. pag.5 Section 2.4 CIDOC CRM was first mentioned in section 2.2, so it would be appropriate to move section 2.4 before section 2.2.
C36. pag.7 Sections 2.7 and 2.8 Replace "This case study" with "This study"
C37. pag.7 Section 2.8 Replace "CIDOC-CRM" with "CIDOC CRM" (two times)
C38. pag.10 Section 3.1 "Mosaico, while rich in content, lacks a formalized ontology and presents information inconsistently" The intended meanning of "inconsistently" here should be clarified, may be with some examples.
C39. pag.11 Section 3.1.1 "The present study builds upon the foundational MMDIO ontology" For ontologies, foundational has a precise meaning, as it describe ontologies providing very general terms that are common across all domains. In my understanding, MMDIO is not foundational (in this sense), but it is domain specific as it concerns medieval manuscripts.
C40. pag.12 Section 3.1.1 "extension of MMDIO for additional manuscript collections and more complex use cases [27]." referencing to [27] sounds inappropriate here.
C41. pag.12 Section 3.3.1 "to link fabio:Book to a foaf:Person" -> "to link a fabio:Book to a foaf:Person"
C42. pag.12 Section 3.3.1 "the class mmdio:SubSection was added, and related to doco:Section via the
existing doco:contains property" should be reformulated as individuals belonging to these classes are related, not the classes themselves.
C43. pag.12 Section 3.3.1 "mmdio:referstoManifestation" -> "mmdio:refersToManifestation"
C44. pag.12 Figure 2 ending arrows are missing in mmdio:refersToManifestation links.
C45. pag.17 Section 4.5 "subsections referred to specific folios. Custom Python scripts " -> "subsections referred to specific folios, custom Python scripts "
C46. pag.17 Section 4.5 "Descriptive metadata, including script style, initials, rubrication, and number of writing hands from the codex records. These elements were categorized" ->
"Descriptive metadata, including script style, initials, rubrication, and number of writing hands from the codex records were categorized"
C47. pag.18 Section 4.6 "These queries were tested:" this sentence needs to be reformulated as what follows are not queries, may be query types.
C48. pag.23 in Figure 8 probably the central "Gloss" should be "nov.6.pr".
C49. pag.22 Section 4.12 "The gloss information from the ‘London, Ogden 5’ manuscript collection, particu-
larly identified in ‘Authenticum’ manuscripts, is presented." It is unclear where this information is presented? Is it returned by the query in Listing 4?
C50. pag.23 Section 4.13 Provenance has a precise meaning in knowledge representation, i.e., object origins. Here it would be appropriate to be more specific and using, for example, "Geographical Provenance".
C51. pag.24 Section 4.13 The following two paragraphs says the same things, so one of them should be eliminated.
"The query retrieved data on the various locations where the manuscript has been housed over time, offering valuable insights into its custodial history. This information is crucial for examining the manuscript’s provenance and understanding the broader patterns of manuscript distribution."
"The query returned data on the manuscript’s past locations, providing valuable insight into its custodial history. This helps trace its provenance and highlights on broader patterns of manuscript circulation."
C52. pag.24 Section 4.13 The following paragraph should be removed
"The manuscript’s journey from its last to its current location offering insight into its historical journey and transitions"
C53. pag.27 Section 4.16 "The output (Figure 13)" but Figure 13 does not present the output of the query in Listing 8. Instead, it illustrates the logical structure of the Montecassino book (see pag.30)
C54. pag.26 Section 4.14 Figure 11 reports data about de-donatione (I can't find it in the online resource), but Listing 6 was about Tres-libri-cum-glossa-Accursii. This may be confusing for readers.
C55. pag.28 Section 4.16 "Figure 12 visually illustrates the interconnections among various platforms, mapping how specific books and manuscripts are linked. This graphical representation helps to clarify the complexity of these relationships". I can't see any complexity, as Figure 13 shows just two nodes and a single edge.
C56. pag.30 Section 4.16 "Olomouc Collection This query, as shown in Listing 10" but Olomuc Collection is not a query. It has to be reformulated, for example as "Olomouc Collection The query shown in Listing 10".
C57. pag.30 Listing 10 In the WHERE clause of the query in Figure 10 the following pattern is not relevant and should be omitted
?Book a fabio:Book; dcterms:title "Olomuc".
C58. pag.33 Section 4.16 "This structured data helps scholars understand" -> "This structured data helps scholars to understand"
C59. pag.34 Section 4.16 The query in Listing 12 has syntax errors.
C60. pag.35 Section 4.16 "It also identifies the individual manuscripts within each manuscript collection using the mmdio:ManuscriptCollection property" -> "It also identifies the individual manuscripts within each manuscript collection using the frbr:part property"
C61. pag.37 Section 5.1 "MELIORATE offers a user-centric interface" The intended meaning of "user-centric" is not clear.
C62. pag.38 Section 5.3 "one of the most prominent digital collections available on the Mosacio Platform" Here Mosacio Platform is clearly a typo. I suggest to replace it with "Meliorate Platform".
C63. pag.38 Section 5.3 The following two paragraphs are redundant. One of them should be removed.
"Figure 21 shows the manuscript collection and sections linked to the digital Authenticum. The manuscript collection is entirely navigable, allowing the user to view information about the manuscripts held in the collections, the periods covered by the collections, where they are held, bibliographic references, further descriptions, and embodiment in a specific codex."
"Figure 21 presents the manuscript collections and sections associated with the digital version of Authenticum. Each manuscript collection is fully navigable, allowing users to view detailed information such as the manuscripts included in the collection, the period they cover, their location, bibliographic references, supplementary descriptions, and how they are embodied in a specific codex."
C64. pag.40 Section 5.3 "The identifiers provide direct links to the Mosacio portal" -> "The identifiers provide direct links to the Mosaico portal"
C65. pag.44 Figure 26 is the same as Figure 27
C66. pag.43 Section 5.3 "Figure 28 demonstrates how the Irnerio platform interfaces with the broader Mosaico system" May be tha "Mosaico" should be changed to "Meliorate"?
C67. pag.46 "The MELIORATE" -> "MELIORATE"
|