A challenge for historical research: making data FAIR using a collaborative ontology management environment (OntoME)

Tracking #: 2345-3558

Francesco Beretta

Responsible editor: 
Special Issue Cultural Heritage 2019

Submission type: 
Full Paper
This paper addresses the issue of interoperability of data generated by historical research and heritage institutions in order to make them re-usable for new research agendas according to the FAIR principles. It outlines a methodological approach allowing to integrate data stemming from different lines of inquiry and belonging to different epistemological levels. After introducing the symogih.org project’s ontology, designed to cope with this issue, it compares it with the factoid model conceived at King’s College London and the CIDOC CRM conceptual model, highlighting the specificities of each and their contribution to data interoperability. Finally, it shows how collaborative data modelling carried out in the ontology management environment OntoME makes it possible to elaborate a common fine-grained and adaptive understanding of information, applying domain knowledge to data production. The condition of a positive outcome of this process is that the research community actively engages in the elaboration of a communal ontology, which the dataforhistory.org consortium is currently seeking to promote.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Franco Niccolucci submitted on 27/Dec/2019
Review Comment:

The paper deals with the harmonisation of ontologies for history. It presents in a very clear way the requirements of this undertaking and how to deal with the issues arising from different perspectives underlying the various approaches to the topic so far adopted. It describes in clear terms a sort of conceptual mapping - rather than a technical one - between them, a necessary step for such harmonisation. Examples help to clarify this process. In the last section, a tool to carry out the job is introduced, but since it is still work in progress less details are provided. However this does not reduce the paper interest, because, as already mentioned, it is necessary first to understand and then to process the correspondence between different concepts.
References to the development of CRMsoc and to ther CRM extensions appear only as links in footnotes; for this, however, the author cannot be blamed as at present they are the only available ones.
The paper is well written with a good style.
In conclusion, I think that the paper may be published as is. It is an important and timely snapshot of the current state of the art. I add below some minor comments that authors may wish - or not - to take in to accounts.
1) the FAIR principles are sometimes mentioned as "the FAIR principles" (page 1) and sometimes as "FAIR principles" (page 2) without the article "the", e.g. in the quotation on top of page 2 and also somewhere else in the same page. To my ear of non-native English speaker, this sounds incorrect: the article should always be present to indicate "those" FAIR principles and not any "fair" principles. Actually, the source quoted in the footnote also uses (for I2) the version with the article: "I2: (Meta)data use vocabularies that follow the FAIR principles". I suggest making this small correction throughout the paper.
2) I would have appreciated a more in-depth analysis of why adaptation to the CRM is so cumbersome and leads to what may appear as a quick fix: I am not sure that modelling a factoid as an E89 or E73 is the perfect solution to represent the richness of the factoid concept. This is perhaps due to to an "original sin" of the CRM. But this would introduce another topic in a self-contained paper with clear argumentation, so authors may wish to consider it in future contributions.

Review #2
Anonymous submitted on 14/Feb/2020
Minor Revision
Review Comment:

This article presents a methodological approach for the integration of data stemming from historical research and heritage institutions in a way that complies to the FAIR principles. The motive is this research is to ensure the interoperability of data, which is addressed via the OntoME ontology environment. OntoME application enables users to align information data models with that of the CIDOC CRM. The presented research has emerged in the course of symogih.org project, launched in 2008, which aims at historical data re-usability, access quality and preservation.
The ideas and methods described in the article are clear and within the scope of the journal.

The author gives a thorough description of the symogih.org project, which serves as the backbone of the presented research and through several examples it elucidates the project's contribution towards data interoperability. Moreover, the author is well-aware of related works in the field and e provides a comparative examination between the proposed method and other well-known relevant methods, such as the CIDOC CRM.

However, this comparison reveals a number of weaknesses that the presented work entails. For example, the model of the symogih.org project does not enable the exact specification of the veracity of each distinct source participating in the process of data integration. This is basically because the 'knowledge unit' of the model encapsulates both changing/developing entities and the historians' assertions as a whole. Therefore, the reliability of the produced descriptions for historical facts cannot be evaluated. The author is aware of this model limitation, which showed up after the alignment (started in 2014) between the ontology of the symogih,org project and the CIDOC CRM ontology. Still, the article lacks any specific solution about how current limitations can be surpassed. The suggestion that there is an active partnership between symogih.org and the CIDC CRM SIG is too generic to be validated given the lack of any evidence about the practical outcomes of this collaboration.

In general, the presented research lacks the measurable evidence/validation with respect to the benefits of the proposed model over existing ones, especially the CIDOC CRM. An evaluation attempt would strengthen the articles' value and would help towards providing concrete examples of the model's contribution. Another weak point of the article is that it does not provide any information with respect to the amount of data already integrated in the model. Section 5 of the article reads more like ongoing work rather than completed research. therefore, its value cannot be readily assessed.

As an overall comment, I would say that the idea presented here is valuable and could have a significant impact on data integration models for cultural heritage information. Still, its implementation being in early stages renders the realization of its true value. A first attempt towards adding value to the present work is to give quantifiable indications with respect to the data already process, integrated and interlinked as well as to provide a validation section in which the evaluation of the proposed model's performance would be presented.

Review #3
Anonymous submitted on 04/Mar/2020
Review Comment:

In the first part, the authors presents the Symogih conceptual model for history, with objects and knowledge units (KU).
Objects are "objective facts" and KU enable users to describe knowledge about such objects according to specific points of view. In particular, KU can be used to describe information extracted from sources such as documents.
Hence, knowledge units can be seen as viewpoints on objects.
The authors compare their approach with the Factoid model which is another model used in history. They also compare their approcah with the CIDOC CRM model for museum and the top level DOLCE ontology.

In a second part, the authors present the OntoME software environment to manage ontologies in history domain.
The tool includes the CDOC CRM model as well as other models such as ABES IdRef and Opentheso.
The tool provides features to perform alignment among models. It also provides features to extend existing models to adapt them for history.
This is part of ongoing work in the history domain to provide an ecosystem for ontologies and semantic data.