Review Comment:
This article presents the InTaVia Knowledge Graph (KG) to integrate heterogeneous, cultural heritage data produced by different organizations. The integration is done through Semantic Web languages and technologies, resulting in a KG based on CRM as core ontology.
IDM-RDF, the ontology used to integrate the data, is an OWL taxonomy that declares the domain and range of object and data properties and includes some axioms of equivalence between classes. As this paper is a 'dataset description', the emphasis is on data publication rather than the ontology itself.
However, I think some aspects of the ontology deserve to be presented or discussed more thoroughly in the paper. In particular:
-- In the introduction, the authors state that knowledge graphs can be used to reveal hidden patterns and relationships in data. This is indeed interesting. Does automated reasoning play any role in this project? If so, what type of reasoning does the ontology support?
-- Figure 1: the diagram deserves explanation, in particular the use of roles for representing participants in events. For example, consider the following RDF triples, based on the model in Figure 1. The triples represent two instances of E12_Production, i.e., :event1 and :event2, with participant :Printer (I consider it as an instance of :Event_Role, following the authors’ proposal). I assume that the same role can be carried by multiple actors; in the example, :John and :Mary both carry the role of :Printer. How can one understand in this representation who is the agent participating in the event? For example, does Mary participate in :event1 or in :event2?
Perhaps there is something that I don’t understand well in the authors’ proposal that can clarify my concerns.
:event1 :had_participant_in_role :Printer.
:event2 :had_participant_in_role :Printer.
:Mary :bearerOf :Printer.
:John :bearerOf :Printer.
As a suggestion, to capture that an agent participates in an event with a role might require the use of reification methods for relations with ariety higher than 2 (https://www.w3.org/TR/swbp-n-aryRelations/).
-- The authors emphasize the importance of representing conflicting information, yet they fail to provide any examples. Consequently, it is difficult to grasp what they mean or the nature of the conflicting data they require. Apart from the lack of examples, conflicting data seems to be represented through so-called 'proxies', but nothing is said about modelling them. It would be very interesting if the authors could explain this aspect of their proposal in more detail, as otherwise it remains unclear how it works.
Other comments:
-- Regarding the dataset description, the authors should specify the type of data contained in each dataset (e.g. name, surname, date of birth, date of death, relatives, artworks produced, etc.). This information would help users to understand the context of the research proposal and the type of data that can be retrieved through the SPARQL endpoint. Adding a table that schematically compares the integrated datasets would also help readers to understand the similarities and differences between the data sources.
-- I accessed the SPARQL endpoint given in footnote 9, and ran the queries provided in the Example tab. The queries return data which are not however explorable. For example, I clicked on the IRI for a person with id 53823 but the system returns a ‘page not found’.
-- In the GitHub repository (https://github.com/InTaVia/idm-rdf/tree/main/idm-OWL), the ontology folder contains three files. The paper says that InTaVia uses a modular structure, so it would be helpful to know how the modules relate to each other. More specifically, of the OWL files in the repository, which one is the main ontology file? I browsed the intavia_idm1.ttl file, but I’m not sure whether it is the main file or how it relates to the others.
-- From a research perspective, I believe that detailing the challenges faced by the authors while developing their project would enhance the quality of this proposal. For instance, did they encounter difficulties when integrating data from various sources or aligning it with CRM? Did they gain any specific advantages from a data modelling, conceptual, or other perspective by reusing CRM?
-- The paper lacks an evaluation of the materials presented. Setting aside the evaluation of the ontology, has the usability of the platform been assessed by users? Have the authors collected user feedback to ascertain whether the platform meets their research needs and expectations?
|