Typed properties and negative typed properties: dealing with type observations and negative statements in the CIDOC CRM

Tracking #: 2753-3967

Authors: 
Athanasios Velios
Martin Doerr
Carlo Meghini
Stephen Stead

Responsible editor: 
Special Issue Cultural Heritage 2021

Submission type: 
Full Paper
Abstract: 
During condition and collection surveys in memory organisations, surveyors observe the absence of features on collection items. They also observe types of multiple components as single instances given that their large number makes them difficult to be captured as separate instances. Such observations are significant to researchers, documented in registration forms but are not easy to model in popular ontologies such as the CIDOC CRM. In this paper the nature of absence is explored from an ontology point of view alongside the role of the Open World and Close World Assumptions in knowledge bases. A proposal is then formulated for the use of special properties within the CIDOC CRM ontology, namely ‘typed properties’ and ‘negative typed properties’ which allow modelling the typology of multiple instances and the absence of instances. The nature of these properties is then explored in relation to their correspondence to longer property paths, their hierarchical arrangement and relevance to thesauri. First order logic statements are used to describe these properties. Examples from bookbinding structures are used given the significance of such observations in the field of bookbinding history. The paper concludes with reference to ongoing implementation work and a summary of findings.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 12/May/2021
Suggestion:
Reject
Review Comment:

The paper proposes to introduce two special types of properties: typed properties and negative typed properties. Typed properties aim at expressing the fact that there exists an unknown individual of a certain type which is related to the subject of the triple. For example, given a property p having D and R as domain and range, the typed property tp has domain D and range T (where T is the (meta)class enclosing all the types in the domain of discourse). The triple s tp t is to be interpreted as follows: there exists an unknown individual i that belongs to t and is related to s by p. Conversely, negative typed properties aim at expressing the fact that there cannot exist an individual of the type t that is related to s by means of p. The work is mainly motivated by the bookbinding activities which need an agile vocabulary for annotating observation. The authors also mention that they are preparing a test dataset for assessing the proposal.

This problem at hand is clearly important. Languages and vocabularies allowing to simply differentiate what is not known from what there cannot exist should be among the knowledge engineering tools. The work is relevant to the topic of the special issue. However, the work seems to me in a preliminary stage and should be substantially reworked before being accepted to the SWJ. My main concern is about the proposal of introducing the types of properties. If I understood the proposal correctly, both kinds of properties can be expressed as OWL axioms. For example, I can express the following facts: 1) individuals belonging to C must be related to at least an individual of D by means of p, i.e. - C subClassOf (p some D) -. 2) there cannot exist an individual of C related to an individual of E by means of r, i.e. - C and (p some E) subClassOf owl:Nothing -. Do these solutions cover your requirements? If not, why? Why do we need to extend the semantics of properties? What are the implications of the solution? By the way, I would have expected an extensive discussion of alternative solutions using only OWL class expressions.
The other concern is about the evaluation. The authors only mention a dataset (which is a list of markdown files - therefore not in standard format) but the evaluation is left to a future work and neither say how they intend to use such dataset.

Review #2
By Stefano Borgo submitted on 15/May/2021
Suggestion:
Minor Revision
Review Comment:

The paper addresses two relevant problems in the context of CIDOC knowledge base. The first is a classical philosophical problem: the existential implications in any statement about a non existent entity. Given the language adopted by CIDOC, statements about lack of something, like a feature or a component, cause to include (implicitly or explicitly) an instance in the domain of quantification of the very entities which are declared not to exist. The second problem is of practical origin: when many instances of a feature are present it may be practically impossible to report all of them. In this case the KB may report only the existence of such type of entities without giving specifications about each.

The source of the problem is identified in the limitation of CIDOC to relationships that have as domain and as range particulars (individuals).
The proposed solution, which is original to my knowledge and marks an improvement wrt the discussed problems even though there are limitations (see below), is to introduce a new set of relations that have domain as before and range not a class of particulars but their type. In short, every statement “x has component y” turns into “x has some component of type Y” (with Y the type of the entity y).

This does not avoid contradictions due to limitations of what one may be able to observe, e.g., I guess bumps in the spine could mark the presence of sewing support for some and be only aesthetic features for others. Furthermore, badly characterized concepts still cause the presence of contradictions (is a sewing support necessarily a functional entity? How well should the function be performed? Is it enough that it was intentionally introduced for this reason even though it fails to perform?)

While the paper is generally well written and clear, some parts need to be revised. For instance, the introduction of the following problem (pg.5) is misleading since it is unaffected by the proposed approach and may confuse the reader about the purpose of the paper:
“book does not have gold-tooled decoration” (E89 Propositional Object) → P129 is about → book (E18 Physical Thing)
The problem raised by this E89 entity has different characteristics than those addressed in the paper.

The set of inferences that the new relations allow need to be extended to the broadening/narrowing of the domain entity, not only of the target type. For instance
From
book cover (E18 Physical Thing) → NTP46 is not composed of physical thing of type (is not type of physical thing which forms part of) → tooled decoration
One cannot infer
book (E18 Physical Thing) → NTP46 is not composed of physical thing of type (is not type of physical thing which forms part of) → tooled decoration

Sect. 3 introduces the proposal stating “Step 1 involves changing the range of each property to ‘E55 Type’.” This is likely badly expressed as the intent, I surmise, is to enrich CIDOC by adding the new TP and NTP properties while keeping (i.e., not substituting) the existing properties.

Finally, since the effects of this approach to discover inconsistencies depend on the possibility to reason on types and since type instances are obtained by connecting to external resources (this implies that description, quality, coverage, organization of types is not controlled by CIDOC), CIDOC may claim that the technique solved the problem while not being able to take advantage from it.

Sect. 3.2 implies that it is possible to characterize when completeness is reached and that this is stored in CIDOC but nothing is said about how this should be done.

Minor points:
“Documenting this correspondence is called ‘instantiation’ (pg.3)”
Instantiation is not about documenting relationships btw a particular and a universal, it is the ontological relationship holding btw them because one satisfies the essential conditions imposed by the other (and is independent of possible documentations).

In sect. 1.2 the sentence “Within the discourse around the CIDOC CRM there has been no previous systematic attempt to address the problem of non-existent instances.” is stated in the context of the second problem but refers to the first.

In sect. 1.3 use “i)” and “ii)” instead of “a)” and “b)” to avoid confusion with the addressed individuals a and b.

pg.4: “Note however, that the counterfactual instance may need to be instantiated in its class in order to test whether it conforms with the applicable constraints, but this instantiation is a technical fact that occurs during the execution of an algorithm on the knowledge base and does not imply existence in the underlying reality”
This is not a valid argument since the instantiation is a logical consequence of the semantics of the expression, the problem is conceptual and one should not blame the algorithm.

pg.5: E55 is confusing unless one is already familiar with CIDOC, anticipate the class with an example

pg.6: the statement “Following the realization that it is more practical to describe a) non-existent things” is weird since you just discussed all the practical problems introduced in the DB from the explicit use of non-existent things.

While the paper is detailed in presenting CIDOC’s notions, it is too verbose on concepts which are common knowledge in this community like the open/close world distinction or the discussion on inferences in Sect. 3.1.3.

pg. 8: “The CIDOC CRM observes [adopts?] the Open World assumption”

Review #3
By Enrico Daga submitted on 12/Jul/2021
Suggestion:
Major Revision
Review Comment:

The article discusses the case of expressing CIDOC-CRM relations between instances and unspecified entities of certain types using direct "typed" properties, linking the resource to the class of the possible entity, allowing to express a relation such as "This physical object has a feature of type X". Similarly, the authors discuss "negative type" properties, where the relationship cannot hold, e.g.: "This physical object does not have a feature of type Y". The motivation is presented in the context of the Linked Conservation Data project and, specifically, bookbinding history, where researchers require to express complex statements involving numerous unspecified features or negative features that can be observed on the objects.

The article reads well. However, the topic is addressed exclusively in the specific case of CIDOC-CRM and does not discuss sufficiently how similar modelling issues can be addressed using techniques such as Reification, OWL features or how similar problems (e.g. conflicting statements, which are part of the motivation originating the design issue) are currently addressed by alternative models, such as Wikidata, for example.
In addition, the presentation lacks a discussion of the problem(s) beyond the very specific case study, and the does not sufficiently explain the requirements that a solution should have from the knowledge representation point of view.
Therefore, the article does not evaluate the proposed solution, nor discusses the applicability and limitations of the approach (for example, what type of inferences can and cannot be derived and how the approach satisfies the requirements). Finally, experimental work (a dataset) is mentioned but not really discussed in light of the problem and proposed solution.

A revised manuscript should answer all the issues above. For example:
- Analyse the requirements beyond the specific case study, implementing a set of competency questions (CQ)
- Formalise the problem from the KR point of view and provide a thorough analysis of relevant literature, focusing particularly on properties of OWL and how they are not sufficient for the task at hand
- Include a theoretical and experimental evaluation demonstrating how the proposed solution is sufficient to address the requirements and is of practical value to users in the domain of reference.

Section 2 seems to attack the problem from a too broad perspective (the Pegasus example) to conclude that the philosophical problem (logically referring to a non-existing entity) does not apply to the problem at hand, which is reduced to a negate existential qualifier ("there exist no individual such as …"). This introduction is not very useful and somehow misleading. The Pegasus problem is about referring to a named entity (for which you need a symbol in the KB) and assert that such entity does not exist (which is a contradiction because the KB contains an entity for it). This is known as the referential fallacy (assuming that the existence of a reference assumes the physical existence of an entity). However, this issue seems unrelated to the problem of the paper.

In Section 3.2 there are some statements that are controversial at least. For example, about the relation between existential, completeness of observation, and complement Classes. OWL is known to be open-world but it includes both existential qualifiers and complement classes. Reasons, why CIDOC-CRM cannot be open world when including existential qualifiers, are needed. Necessarily, the solution needs to be compared with equivalent structures in OWL.
The solution (creating related Typed and Negative Typed properties for each CIDOC-CRM property) should be also compared to the adoption of RDF blank nodes as existential quantifiers.

Possible related work:
- Svátek, Vojtech, Ján Kluka, Miroslav Vacura, and Martin Homola. "Pattern Alternatives for Referring to Multiple Indirectly Specified Objects." In WOP@ ISWC. 2017.
- Svátek, Vojtěch, Ján Kl’uka, Miroslav Vacura, Martin Homola, and Marek Dudáš. "Patterns for Referring to Multiple Indirectly Specified Objects (MISO): Analysis and Guidelines." In Advances in Pattern-Based Ontology Engineering, pp. 1-24. IOS Press, 2021.
- Hernández, Daniel, Aidan Hogan, and Markus Krötzsch. "Reifying RDF: What works well with wikidata?." SSWS@ ISWC 1457 (2015): 32-47.
- Hogan, Aidan, Marcelo Arenas, Alejandro Mallea, and Axel Polleres. "Everything you always wanted to know about blank nodes." Journal of Web Semantics 27 (2014): 42-69.