Review Comment:
The paper describes the ongoing work that is being carried on in the Virtual Record Treasury of Ireland to develop a Knowledge Graph. The paper is a continuation of some papers that have been published at workshops on the topic adding new content about the evolution of the knowledge graph mainly to include geospatial information.
Overall, I found the paper well written, easy to follow and interesting. Although it may not contain advanced theoretical concepts, I think it presents a practical use case for knowledge graphs in digital humanities, including geospatial aspects.
About the main dimensions of review for a research contribution.
Originality. As far as I can say the work presented is original. The authors have published some prior work related to this one in 2 workshops, but they cite that work properly and I think it is fine.
Significance of the results: this paper is based more on describing a practical application of semantic technologies within the digital humanities domain more than a new research contribution. The significance is more related with the lessons learnt and the potential benefits for the users of that information. The authors conducted some usability studies which are fine, although it would be great if there were other kind of significant results like some historians that had been able to discover something new using the knowledge graph…but I understand that it can be difficult to provide those results. Maybe, the authors could include some details about the usage of the system…analyzing the logs of the system or the interaction of the users and trying to capture some patterns which could be used to measure the satisfaction of the users. Anyway, I think the authors provide a good description of a real system which is currently deployed and I think it can be enough.
- Quality of writing: I think the paper is well written and the contents are clearly explained including some examples.
- Long-term stable URIs of the resources: The authors include URIs to the deployed system which is currently at: https://kg.virtualtreasury.ie/, although that URI can be considered stable, maybe, they could also include other stable URIs like which could be redirected to that one. I think the authors don’t include references to other resources like the github repo of the knowledge graph source code, or some other data.
Some minor comments or suggestions:
- I think the paper would benefit if it included some description about the general architecture of the system. For example, the authors use Morph-KGC to transform external data, OpenStreetMap, Virtuoso as a triple store, maybe adding a diagram with the architecture and some paragraph discussing the design decisions and alternatives considered would be relevant for readers that would be interested in applying similar solutions to other domains. Although the authors included figure 7 which explains the production and development server, I think a more clean picture could be presented as well.
- The authors indicate that the ontology is built upon E55_Type to classify persons, places, etc. and later indicate that two classifications have been added (era and place types). Are those classifications available in the ontology or in the user interface?
- The use of SKOS can be interesting for searching and navigating, is it available in the user interface?
- I was looking to the ontology available at: https://ont.virtualtreasury.ie/ontology/index-en.html and I found that some of the descriptions contain one number at the end…for example the label for Barony ends with the number 281, in other cases, they end with a URI like Ballyboe which contains the link to www.oed.com/view/Entry/269953, maybe the authors would like to update those descriptions?
- I also noticed the discussion about the design of the URIs which contain a mixture of descriptive labels with opaque ones. I would suggest the authors to relate those decisions with some of the multilingual linked data patterns discussed for example here: https://journals.sagepub.com/doi/10.3233/SW-140136. I noticed the ontology contains language-tagged description in English together with a plain string. Maybe the authors want to justify the reason for that?
- Page 7, “For example, Early Modern Places is a dataset includes a…”
- Page 8, after (Figure 4) and footnote 7, there is an extra whitespace.
- Page 11: I think the following sentences are not grammatically correct: “The participants were asked to completed 8 edit…”, “It is planned to conducted another evaluation on…”
- After reading section 5.1, it is not clear for me how the Knowledge Graph is updated or if there is some mechanism to keep updated the content when more information is found about, for example, some person, or some information has been found incorrect…are there any update policies or mechanisms? This question was raised to me when I read this statement: “...providing access to a write-protected to the Virtuoso triplestore.”, if it is write-protected, then, how can those contents be updated?
- In my opinion the combination of descriptive URIs with opaque ones and specially the use of alphanumeric ones, makes those URIs a bit difficult to read…I would suggest using all numeric IDs like those from Wikidata or more descriptive ones…but those alphanumeric identifiers seem a bit difficult to memorize (I noticed that the authors removed the “l” because otherwise it would be confused with a 1 or the vowels to avoid generating words) but nevertheless, handling those URIs can be a nightmare. Although I understand that it will be difficult to change that decision, maybe provide some justification?
- Subsection “Redirect policy” starts with “An updated redirect policy…”, updated with regards to what?
- Page 14: “...depending on whether the URI is…”
- I think the authors don’t provide a public/open SPARQL endpoint which could be used by potential developers and other applications that wanted to reuse the data portal contents in a programmatic way. Did the authors consider that possibility? Could you justify the pros and cons of that decision?
- In lessons learnt, the authors indicate that SHACL validation helped maintain data integrity when modifying or adding resources…I would suggest the authors to be more ambitious with the use of shapes and offer those shapes as part of the technical documentation of the knowledge graph, which could also be useful for consumers of the SPARQL endpoint…although I think the SPARQL endpoint is not public.
- I am not sure if the URIs employed in the Knowledge Graph follow the linked data principles, I tried to obtain an RDF Turtle representation for Richard Talbot using curl and I didn’t receive any response. I used this command:
curl -H "Accept: text/turtle" -L "https://kg.virtualtreasury.ie/person/Talbot_Richard_c17/v1xf6p1"
In my opinion, a proper knowledge graph in a semantic web context should follow the linked data principles offering RDF…and if it doesn’t at least, in the paper, the authors should indicate why not.
|