Transdisciplinary approach to archaeological investigations in a Semantic Web perspective

Vincenzo Lombardo

Special Issue Cultural Heritage 2021

Full Paper
In recent years, the transdisciplinarity of archaeological studies has greatly increased because of the mature interactions between archaeologists and scientists from different disciplines (called ``archaeometers''). A number of diverse scientific disciplines collaborate to get an objective account of the archaeological records. A large amount of digital data support the whole process, and there is a great value in keeping the coherence of information and knowledge, as contributed by each intervening discipline. During the years, a number of representation models have been developed to account for the recording of the archaeological process in data bases. Lately, some semantic model, compliant with the CRMarchaeo reference model, has been developed to account for linking the institutional forms with the formal knowledge concerning the archaeological excavations and the related findings. On the contrary, the archaeometric processes have not been addressed yet in the Semantic Web community and only an upper reference model, called CRMsci, accounts for the representation of the scientific investigations in general. This paper presents a modular computational ontology for the interlinked representation of all the facts related to the archaeological and archaeometric analyses and interpretations, also connected to the recording catalogues. The computational ontology is compliant with CIDOC-CRM reference models CRMarchaeo and CRMsci and introduces a number of novel classes and properties to merge the two worlds in a joint representation. The ontology is in use in ``Beyond Archaeology'', a methodological project for the establishing of a transdisciplinary approach to archaeology and archaeometry, interlinked through a semantic model of processes and objects.
Review #1
By Xander Wilcke submitted on 10/Nov/2021
With the revised version, I feel that the authors have improved the quality of their paper significantly. More so, all my original points are addressed either completely or close enough, except for the lack of a proper evaluation.

While I greatly appreciate that the authors have added a whole section on preliminary evaluation, I still find it worrisome that, for a research paper, no proper evaluation with metrics and significance testing has yet been performed. It is comforting to read that there have been discussions with some domain experts, and that these discussion have already led to new insights and possible improvements, it still tells us little about how well the archaeometric domain has been modelled by the authors or how effective the model is for use by its target users, let alone whether any of these results is actually significant. With the seemingly close ties between the authors and the archaeological community, one would assume that doing a workshop with 15 to 20 participants, who would then test and evaluate the model, would not be too difficult to set up in a short amount of time. That the authors decided not to pursue this direction remains a sore spot of the paper in my opinion. That the authors also do not mention the number of experts who partook in these preliminary evaluations, whether they are part of the BeArchaeo project, and what form these discussions took, only increases my scepticism.

As for the PURL link, despite the authors confidence that this link is working correctly, I'm still welcomed by the same 404 page as during my original review: Since PURL is clearly not at fault, I would suggest that the authors check with their own IT department why this might be the case. Also, I suggest that the authors move their ontology from their personal webpage to some more durable repository, like GitHub or Zenodo, to avoid further issues. Since the future of PURL has become a bit uncertain the last few years, the authors might also want to consider using one of the more recent alternatives, like

Finally, I want to stress that I think that the work done by the authors is very relevant to the archaeological domain, and that it deserves a publication, but that, as mentioned, it is the lack of proper evaluation that prevents me from giving it the green light. However, if not here, then I encourage the authors to publish it elsewhere after having done such an evaluation.

Review #2
By Alessandro Adamou submitted on 27/Nov/2021
The paper presents the output of the beArchaeo project, which includes the formalization of an ontological structure that extends the capabilities of cultural heritage models associated to CIDOC-CRM, especially CRMSci and CRMArchaeo, to encompass other aspects of the praxis of archaeoogical investigation such as archaometrics, and by extension their reach into other related disciplines such as chemistry. The model is enjoying adoption within the beArchaeo community and alignments with well-known controlled vocabularies like AAT.

There is a prior version of this paper that was reviewed earlier. Compared to that version, it appears that the authors have put a lot of effort into heeding the comments of the reviewers and I agree that the paper is in much better shape right now, therefore I will only highlight the last few changes required and elaborate on the new content.

The ontology description has been given adequate space and structure and now elaborates on the characteristics that a reader of this journal, even on a domain-specific special issue, will find themselves comfortable with. The structure of this section is, in fact, so detailed that it makes me wish other sections were similarly structured: particularly, Section 3 is very long and, since there is a perceived sequencing of the phases of the data curation model, perhaps numbered subsections reflecting these phases would be more inline with the ontology section structure.

The authors have provided the methodological underpinnings of their development process: the choice of the NeOn methodology makes perfect sense, however, in the interest of having the paper as self-describing as possible, it would be useful to spend a few words reminding the reader what the scenarios are about. It's enough to just quote the scenario title from the NeOn book (e.g. "Scenario 2: Reusing and re-engineering non-ontological resources") and let the reader look it up for more details.

I also observe that it is now much clearer in which ways the beArchaeo ontologies relate to other models like ArCo and thesauri like AAT. What I find myself less comfortable with is the choice of how to represent alignments. The authors have opted for bespoke terms for the vocabulary to refer to (e.g. hasGettyAATMaterial): this is one of the possible ways to do it and is used for e.g. authority control in Wikidata, but there are concerns about the flexibility of this model as it fuels the expectation that the ontology will need to be extended should another vocabulary, say GND, be incorporated. An alternative way would be to use a single hasMaterial property that may reference values in AAT, beArchaeo and other indistinctly, and attach provenance information to these alignments as needed (though that might be best achieved using more recent methods of annotating statements like RDF*).

I am also interested to know if the authors have looked into how basic knowledge patterns of what they modelled in beArchaeo, if any, have been represented in foundational ontologies (e.g. how measurements, regions and parameters are modelled in DOLCE/DUL) and if subsuming concepts from these ontologies could be considered.

What seems to be the most important addition to this new version is a preliminary evaluation section. This is largely based on a discourse that is part qualitative, part describing how beArchaeo is situated in its research network. Obviously, it would have been unrealistic to expect that a full-fledged user evaluation be scheduled and carried out between revisions of the paper, so on that basis the provided content may even suffice. I still wonder if some more context could be provided to substantiate important statements like "The archaeologists have found the model accurate": for example, with a few quantitative data about the size and nature of the group that gave such feedback, or if the process to reach such accuracy (did they also argue about completeness by the way?) was iterative and based on multiple feedback rounds or not. Within the boundaries of what is sensible to expect, the more concrete this section can be made the better.

For a stable resource link, the authors have supplied the URL of an OWL/XML document that imports the entire ontology structure. This is acceptable as the authors have also generated LODE documentation as requested and provided its URL in the paper: perhaps the authors could clarify if the entire parent directory is a relevant reference for the paper content as well?

Several footnotes (those linking to CRMsci, CRMarchaeo, ArCo and others) have corresponding publications, journal/conference papers as well as white/technical papers: please turn as many of these as possible into bibliographical references.

More detailed things to look into (notation in page:line range)
* 3:14-15 - including representations of specific domains is not a job of the Semantic Web paradigm, please rephrase (one possible way is to replace "pardigm" with "schema coverage" but there might be a better formulation)
* 10:44-46 - "the beArchaeo ontology comprises three modules: [...]"
* 10:48 - "non-ontological sources"
* 10:17 - "NeOn" is capitalized like this
* 11:10 - rather than providing a human-readable version of the ontology, LODE automatically produces ontology documentation.
* 12:32 - I understand the reason for an isEqualTo propety separated from ontological equivalence, but perhaps the property could be renamed to reflect that the equivalence relation is that of belonging to the same stratum
* 14:32: capitalized as PROV-O
* 15:37 - shouldn't it be "thermoluminescence" with an 'h'?
* 17:29-30 - "[...] that are being used for interpretation and will be the basis for the final exhibition."
* 17:44 - "Some interesting issues also rose" (or "were also raised")
* 18:46 - "The conceptual model _is_ the outcome" (or "was")
* 18:32-35 - that this is the first born-Semantic archaeological approach is a rather bold statement: would one argue that prior projects like Pelagios do not apply? I would advise a scrutiny of the state of the art in e.g. the Digital Classicist wiki at (by the way, pleae make sure to be there).

Review #3
By Franco Niccolucci submitted on 29/Nov/2021
The paper can be accepted.
I have however some recommendations for future work on the subject.
1) Obviously it is unlikely that the first version of an ontology is perfect. It will be necessary to refine what at present seems a very complex structure, with some possible redundancy in the class & property definitions. One example: why there is a new class FormationProcess, subclass of A4 Stratigraphic Genesis? Was the latter insufficient? Was this new class strictly necessary and a solution with "P2 has type" not suitable? Thus I would consider the paper a first attempt, requiring further work to be implemented but still worth publishing. In the meantime, implement Occam's razor!
2) The repository part is still very preliminary. This is acceptable at the present level of development, where the focus is on the theoretical aspects of the ontology and not on its implementation, but it will be necessary to go beyond Google drive (as authors also state). I would not consider this a secondary aspect in the future, nor an easily solvable one.
3) Same comment for the use of Omeka-S. Is it the appropriate tool with all the required functionalities? I am not sure it is powerful enough. The choice of the package should go hand-in-hand with the repository solution.
However, such data management aspects (points 2 & 3) can be addressed in future work and it is useful to publish the present one as is, not least to allow a discussion on the matter and compare different solutions.