A Strategy for Archives Metadata Representation on CIDOC-CRM and Knowledge Discovery

Dora Melo
Irene Pimenta Rodrigues
Davide Varagnolo

Eero Hyvonen

This paper presents a strategy for the semantic migration of Portuguese National Archives records into CIDOC-CRM standard, an ontology developed for museums, within the context of the EPISA project. The approach to automatically populate the CIDOC-CRM is based on Mapping Description Rules to semantically translate the archives descriptive information into CIDOC-CRM representation. The compliance of the CIDOC-CRM model recommendations guarantees that the populated CIDOC-CRM ontology of archives descriptive information verifies interoperability, and could be linked and integrated with other populated CIDOC-CRM ontologies. In the information modelling, requirements on the mapping representation, due to the intent of interpreting natural language text to automatically extract information of metadata text fields and to interpret natural language queries, are taken into account. To automatically interpret the Mapping Description Rules, OWL API was used to obtain the set of assertions that represents the information in the target ontology and two datasets are available with some migration examples. The exploration of the knowledge representation is done through some Description Logic queries to highlight the advantages of having this new representation of the National Archives. The evaluation of the resulting representation can be done automatically proving its correctness for the metadata that has a direct representation in CIDOC-CRM.
Review #1
By Carlo Meghini submitted on 22/Nov/2021
The authors have responded in a satisfactory way to the observations made in the first review. I still believe the paper is worth publishing as it addresses an important topic and presents valid results.

Review #2
Anonymous submitted on 06/Dec/2021
In my previous review I suggested a major revision of this paper, and I believe that the authors have sufficiently addressed my comments. Thank you for your efforts!

Although I think the readability (linguistic) could always be improved a bit more, I would recommend publication, since the paper present very relevant and important work in its area.

Some observations from reading the edited passages:
Review #3
Anonymous submitted on 28/Dec/2021
This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing. Please also assess the data file provided by the authors under “Long-term stable URL for resources”. In particular, assess (A) whether the data file is well organized and in particular contains a README file which makes it easy for you to assess the data, (B) whether the provided resources appear to be complete for replication of experiments, and if not, why, (C) whether the chosen repository, if it is not GitHub, Figshare or Zenodo, is appropriate for long-term repository discoverability, and (4) whether the provided data artifacts are complete. Please refer to the reviewer instructions and the FAQ for further information.

The authors have extended and clarified the paper substantially in this new version and provided a detailed account of how the paper has been modified. The proposed corrections and clarifications that I suggested in my earlier review have been addressed.