Transdisciplinary approach to archaeological investigations in a Semantic Web perspective

Tracking #: 2775-3989

Authors: 
Vincenzo Lombardo
Tugce Karatas
Monica Gulmini
Laura Guidorzi
Debora Angelici

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Abstract: 
In recent years, the transdisciplinarity of archaeological studies has greatly increased because of the mature interactions between archaeologists and scientists from different disciplines (called ``archaeometers''), where a number of diverse scientific disciplines collaborate to get an objective account of the archaeologic records. A large amount of digital data support the whole process, and there is a high value of keeping the coherence of information and knowledge, as contributed by each intervening discipline. During the years a number of representation models have been developed to account for the recording of the archaeological process in data bases and lately, some semantic model, compliant with the CRMarchaeo reference model, has been developed to account for linking the institutional forms with the formal knowledge concerning the archaeological excavations and the related findings. On the contrary, the archaeometric processes have not been addressed yet in the Semantic Web community and only an upper reference model, called CRMsci, accounts for the representation of the scientific investigations in general. This paper presents a modular computational ontology for the interlinked representation of all the facts related to archaeological and archaeometric analyses and interpretations. The computational ontology is compliant with CIDOC-CRM reference models CRMarchaeo and CRMsci and introduces a number of novel properties and classes to link the two world in a joint representation. The ontology is in use in the BeArchaeo, which is a methodological project for the establishing of a transdisciplinary approach to archaeology and archaeometric disciplines, interlinked through a semantic model of processes and objects.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Franco Niccolucci submitted on 10/Apr/2021
Suggestion:
Major Revision
Review Comment:

The paper is clear and well-written. It deals with an important topic, i.e. the integration of data resulting from scientific analyses with other archaeological documentation. Overall, the argumentation is clear and well explained.
That said, I have however some comments including a general one, which in my opinion deserve authors’ attention. If authors wish, it may require a major revision of the paper.

I am a bit uncomfortable in quoting my own research or work done by research teams in which I am directly involved. Please note that I am not looking for more quotations: my comments below are only aimed at improving an already very good paper as the present one, and at avoiding criticism once it is published. Authors should feel free to decide if and how they wish to take my suggestions into account.

In the introduction, authors list a number of initiatives that make archaeological datasets “increasingly available online” and list some very important ones. But, in this survey, they ignore the aggregating function of the ARIADNEplus project, which they mention only as regards a study on the archaeological users’ needs, a side topic of the ARIADNE research.
Actually, ARIADNEplus makes about 1.7 million archaeological datasets findable, accessible (within the limits of the original access license), interoperable and retrievable, using a CIDOC CRM compliant ontology. The original data are stored with institutions from all Europe (not only ADS) and can be retrieved via the ARIADNE catalogue at portal.ariadne-infrastructure.eu. The outcomes of ARIADNE until 2019 and its follow-up ARIADNEplus have been published in an edited volume “The ARIADNE Impact”. A more technical description of the (first) ARIADNE project has appeared on JOCCH, see below. Item-level integration with archaeological science is work in progress, so very few reports about archaeological sciences are included in the catalogue – and, in general, are available online.

A humorous aspect of the above forgetfulness is that the institution to which one of the authors belongs, INFN CHNet, is an active ARIADNE partner and is currently developing the extension of the CRM ontology to archaeological scientific data and promoting their inclusion in the catalogue: gnothi seauton, Know Thyself, would say Socrates.

In conclusion, not mentioning the ARIADNE contribution may derive from deliberate disregard or from distraction: the former is fine but should be explained; the latter is not appropriate in a scientific paper for a project which has thousands of users.

However, a major issue, in my opinion, is the lack of consideration about what is called “data provenance”. This term usually indicates the circumstances of data acquisition and processing, producing so-called raw (numeric) data and any further transformation of them. All digital instrumentation processes analog measurements internally to convert the analog measurements into numbers, the raw data. The pipeline from physical or chemical results to digital raw data is relevant to assess the reliability of the latter. This pipeline often includes “black boxes”, for example proprietary devices or software included (and not disclosed to users) in the processing. The environmental conditions under which the experiment is made, its operational protocol, and even the research question for which it was carried out originally are all relevant for the reliability of the results. For example, stating that an XRF analysis showed that a pigment is Egyptian blue means nothing if it is not stated which XRF device was used, if a cheap handheld one or a laboratory-grade one. The instrument settings and calibration are also important. Since authors mention, among others, photogrammetry and 2D/3D acquisition (page 12), documenting the environmental conditions of the data acquisition is of paramount importance. In this case, also the goal of the original investigation matters, as it may influence the chosen precision, the detail level and so on. I add below some relevant references for the digital provenance issue.
As a matter of fact, all this information is usually recorded in the archaeometry report: this includes statements such as “we analyzed a sample taken from X, with the device Y and settings Z following protocol W”: but all this disappears when collapsed into S21 Measurement. Thus, potential re-users of the archaeometry analyses remain puzzled if the data are suitable and reliable enough, or not, for their research.
Sometimes this is the consequence of a dismissive attitude (“I’m the scientist and don’t bother me”). For sure this is not the case with this paper, so show that it isn’t.

I assume – perhaps a bit overoptimistically – that authors are aware of these considerations and would like to mention the above-mentioned issue as forthcoming work, improving and detailing their ontology. This is by no means mandatory for publication. They should consider my comments as a friendly warning towards possible criticism to their paper and suggestions to improve the in-depth semantic integration of all contributions to archaeological research, of which archaeometry is in my opinion a key one, and their work a valid contribution.

Thus, I will not object to publication is such comments are disregarded. The decision, and responsibility, is up to authors.

Here are some references (for reading, not necessarily for quoting!).

About ARIADNE/ARIADNEplus

C. Meghini et al (2017J “ARIADNE: A Research Infrastructure for Archaeology”, Journal on Computing and Cultural Heritage, Volume 10, Issue 3 - August 2017, Article No.: 18, pp 1–27. https://doi.org/10.1145/3064527

J. Richards and F. Niccolucci (eds.) (2019) The ARIADNE Impact Archaeolingua, Budapest.
https://zenodo.org/badge/DOI/10.5281/zenodo.4319058.svg

About data provenance (mainly about digital provenance)

M. Doerr and M. Theodoridou (2011) “CRMdig: A Generic Digital Provenance Model for Scientific Observation” 3rd {USENIX} Workshop on the Theory and Practice of Provenance (TaPP 11). Available at https://www.usenix.org/conference/tapp11/crmdig-generic-digital-provenan...

N. Amico, P. Ronzino, A. Felicetti, F. Niccolucci (2013) “Quality management of 3D cultural heritage replicas with CIDOC-CRM” Proceedings of CRMEX@ TPDL http://ceur-ws.org/Vol-1117/

K. Tzompanaki, M. Doerr, M. Theodoridou, I. Fundulaki (2014) “Reasoning based on property propagation on CIDOC-CRM and CRMdig based repositories” available here: https://www.semanticscholar.org/paper/Reasoning-based-on-property-propag...

C. Strubulis, G. Flouris, Y. Tzitzikas, and M- Doerr (2014) “A case study on propagating and updating provenance information using the CIDOC CRM” International J. on Digital Libraries 15, 27–51. Doi: 10.1007/s00799-014-0125-z

N- Carboni et al. (2016 ) “Data provenance in photogrammetry through documentation protocols” ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume III-5, 2016 XXIII ISPRS Congress, 12–19 July 2016, Prague, Czech Republic, 57-64. Available at https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/III-5/5...

Ontology extension to heritage science

F. Niccolucci and A. Felicetti (2018) "A CIDOC CRM-based Model for the Documentation of Heritage Sciences," 2018 3rd Digital Heritage International Congress (DigitalHERITAGE) held jointly with 2018 24th International Conference on Virtual Systems & Multimedia (VSMM 2018), San Francisco, USA, pp. 1-6, doi: 10.1109/DigitalHeritage.2018.8810109

L. Castelli, A. Felicetti and F. Proietti F. (2019) “Heritage Science and Cultural Heritage: standards and tools for establishing cross-domain data interoperability” International J. on Digital Libraries, doi: 10.1007/s00799-019-00275-2

Other papers are forthcoming from this INFN-based team

Review #2
By Xander Wilcke submitted on 21/Jul/2021
Suggestion:
Major Revision
Review Comment:

In this paper, the authors present a conceptual model and ontology for digital archaeometric knowledge generated during or after an archaeological excavation. The model and ontology, called BeArchaeo, are designed to fit within the CIDOC conceptual reference model, especially CRMarchaeo and CRMsci, and incorporates concepts and relations from well-known standards, including OGC's geospatial ontologies and Getty Art and Architecture thesaurus. The authors demonstrate their model and ontology with several use cases, most of which involve knowledge from their own Japan-based excavation project. Evaluating the model and ontology, and promoting their use in the community, are deferred to future work.

With this work, the authors present a missing piece of the puzzle to model archaeological knowledge in RDF. In that sense, this work proposed here may be very relevant to the archaeological domain. I say 'may', since, unfortunately, the authors did not evaluate their model with the archaeological community, nor in any other sense, which hurts the scientific contribution significantly. Since the authors pose this as a full paper, rather than a resource paper, I feel that a minimal evaluation must be present at the very least, especially given that the authors discuss the need from the community for an archaeometric model thoroughly. A plan on how the authors think to encourage the adoption of their model would be welcome as well.

Since the main contribution of the paper is the model/ontology, I would've expected some sort of structured overview of all concepts and properties, either as a diagram or as e.g. a list with descriptions). Instead, the authors explain their model/ontology in-text while going through several elaborate use cases. While this provides a nice in-depth view into the archaeological world, it makes it quite difficult to get an overall idea of the model/ontology and what concepts and properties it contains. I am not even sure whether all components of the ontology were discussed, or merely a subset. Since the purl URL points to a 404, I was unable to check.

There is very little rationale about the choices made in the model's design. Why, for example, do the authors introduce 'isEqualTo' to state that two entities are the same, instead of using the universally-used 'owl:sameAs', why do they introduce 'earlierThan' and 'laterThan' when there are other ontologies that've already standardized 'before' and 'after' properties, and why is elevation specified as integers rather than floats or similar? The authors might have good reasons, such as the ambiguity of 'sameAs', but no such reasons are given. Also, on page 10, the authors explain that chronological information is given as free text, with the idea to use a time vocabulary in the future. Was the use of PeriodO (https://perio.do) ever considered?

Several paragraphs, including a use case and diagram, are exclusively about CRMsci. Since CRMsci is a totally different model, and not a contribution of this paper, I wonder why the authors have given so much attention to it. I understand that BeArchaeo works well with CRMsci, but much of this can be left out of the paper without changing its message. Similarly, there are numerous paragraphs that seem to tell more about the excavation project and its findings than about the model/ontology. While interesting to read, it feels a bit elaborate for mere context.

I cannot find the 'practical workflow and form interfaces' that the authors claim to have designed in the conclusion. Are these part of the paper (if not, why mention them here?)? Also, please provide more details about the exploratory sessions with archaeologists which led to the model's design. The authors also mention the 'multi-lingual character' of their model; what is meant by this?

Finally, while the paper can be understood quite well, there are still rather many language errors throughout the paper, such as missing commas, imprecise wordings, and overly long sentences. The use of '...' instead of et cetera is also a bit informal, and several figures contain spelling error bars (squiggly red line). I've listed a few under 'minor points'.

# minor points

- pp.1; keywords missing
- pp.1-2; 'sense making' -> 'making sense of' (or 'interpreting' if you want to keep the sentence structure the same)
- pp.2; 'to the extreme consequences' -> 'to the extreme'
- pp.2; 'designates' -> 'involves'
- pp.2; 'cf' means a counter argument, not look here
- pp.3; 'make available [...]' -> 'make a number of [...] processing available'
- pp.3; 'made available' -> 'are made available'
- pp.4; specify which OGC ontology you use
- pp.4; 'consists in' -> 'consists of'
- pp.4; 'value addings' -> 'added value'
- pp.8, fig.4; upside down pyramid?
- pp.12; 'has made possible [...]' -> 'has made [...] possible'
- pp.12; 'to the best of the present knowledge' -> 'to the best of our knowledge'
- pp.12; 'let emerge [...]' -> 'let [..] emerge'
- pp.12; 'beArchaeo' overflows right margin
- pp.13; purl URL wrongly formatted and links to 404

- combine references [X][Y] as [X, Y]
- many uses of ';' are incorrect; start a new sentence instead
- start sentences with prepositions (e.g. 'the') instead of 'Ontology has been [...]' (also occurs within sentences, e.g. 'between BeArchaeo ontology')
- end the last item in an in-text list 'X, Y, Z' with 'and': 'X, Y, and Z'

Review #3
Anonymous submitted on 05/Aug/2021
Suggestion:
Major Revision
Review Comment:

This paper highlights a semantic gap in the formal representation of archaeological studies, particularly the lack of a connection with related disciplines such as archaeometry. Through the beArchaeo project, it proposes a system of ontologies that extend the CIDOC-CRM galaxy and discusses its applications.

While the very need for representing interdisciplinarity in archaeological research is easy to understand, it is arguably difficult to delineate the intricacies of how the actual disciplines connect to each other and to archaeology. In this regard, the paper does good work of introducing and exemplifying the setting in the introductory section, though some more light should be shed on what the reflexive methodologies are, that apply to the field.

The authors then illustrate a proposed model for the digital curation of archaeological (in the aforementioned broadened sense) data, with examples. This is a part that I believe should be given a more solid structure. On a look at Figure 1, and the way Section 3 describes it, a few aspect appear unclear. First, the connection with reflexive aspects of archaeological investigation seem to be lost: the tasks are admittedly commonplace in data curation for the cultural heritage domain, one would have expected them to be either specialised or broken apart into tasks of e.g. archaeometrical nature. Then, the actual sequences and cycles that can realise a workflow out of this model are also unclear: it might be helpful to provide numbering and ordering to the tasks and phases in the figure, and reflect that onto the paragraphs in section 3, so that the reader understands what kind of content an investigation starts with, or to what a cycle applies (Figure 1 seems to suggest it all starts from OWL/RDF data, and only later it becomes apparent that this is not the case).

After providing extensive exemplification, the paper proceeds to describe the ontology itself. The description mostly covers the conceptualisation point of view, which is why this section should be significantly expanded to cover at least the following aspects:

1. Methodology: were existing ontology development methodologies considered? Was one of them picked or was a hybrid approach adopted to develop it? What phases were foreseen and/or implemented?

2. Modularisation: by following the provided link, one comes across a set of rather under-described ontology modules. What was the rationale / process that warrants this kind of modularisation? Please document it both on the paper and in the ontologies themselves.

3. Alignment. There is a trace of this in the section regarding e.g CRMsci, but this requires more structure. Please be specific with regard to how ontology/thesaurus alignments are carried out, e.g. whether the alignments with the CRM ontologies are performed through equivalence/subsumption of classes or properties, and whether there are alignments for thesaurus concepts (e.g. Clay in Archaeo_ontology.owl with Clay in AAT).

4. Logical profile. Was a specific description logic or tractable fragment of OWL (e.g. OWL 2 EL/RL/QL) considered for the ontology? Any specific consideration on how data formalised according to this ontology should be queried and reasoned upon? (note that this strictly depends on the rationale for alignments as well)

5. Documentation: there appears to be scarce documentation, online or on the paper, of the ontology network as a whole, the modules of which it is comprised, and the terms therein. At a bare minimum, consider having an rdfs:comment for each term and use LODE to automatically generate documentation.

A sub-section structure would improve the readability of the section once expanded.

It is also advisable to provide a dedicated section (possibly even a subsection of 5) that offers pointers to and an outline of the tangible resources: the URL to the ontology download page is provided almost in passing, and it is not known if there is any other associated resource (sample data, documentation...). Also, be sure to add the http[s] scheme to the URL.

The examples that are presented throughout mention several use cases from different contexts; also the concluding section references the growth of the beArchaeo database: should we therefore expect that one or more linked datasets formalised according to these ontologies will be published by the project, as a concrete application of the presented ontology network? If so, please make that aspect stand out, possibly in connection with the evaluation that the conclusion already mentions.

A few other issues that should not be hard to address, for an otherwise well-written paper:

* The notation for entities e.g. CRMsci/S21 Measurement is not quite as commonly expected in ontology papers; please consider a QName-like syntax such as crmsci:S21_Measurement

* Some figures show the red underline that is usually added by spellcheckers: please see to it that they do not appear in the revised figures

* P1L31 (abstract): "there is a high value of keeping..." -> "there is great value in keeping..."

* P5L29: "make data available in electronic format or as paper publications" (btw by electronic I suppose you also mean as structured data?)

* P7L1: "Google Drive" in uppercase (btw, consider mentioning dedicated data repositories like Zenodo)

* Phrases such as "has made possible the emergence" or "let emerge" (e.g. P12) should be reformulated. Possible ways are "has brought to the surface" or "has allowed X to emerge".