Empirical ontology design patterns and shapes from Wikidata

Tracking #: 3542-4756

Authors: 
Valentina Anita Carriero
Paul Groth
Valentina Presutti

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Abstract: 
The ontology underlying the Wikidata knowledge graph (KG) has not been formalized. Instead, its semantics emerges bottom-up from the use of its classes and properties. Flexible guidelines and rules have been defined by the Wikidata project for the use of its ontology, however, it is still often difficult to reuse the ontology's constructs. Based on the assumption that identifying ontology design patterns from a knowledge graph contributes to making its (possibly) implicit ontology emerge, in this paper we present a method for extracting what we term empirical ontology design patterns (EODPs) from a knowledge graph. This method takes as input a knowledge graph and extracts EODPs as sets of axioms/constraints involving the classes instantiated in the KG. These EODPs include data about the probability of such axioms/constraints happening. We apply our method on two domain-specific portions of Wikidata, addressing the music and art, architecture, and archaeology domains, and we compare the empirical ontology design patterns we extract with the current support present in Wikidata. We show how these patterns can provide guidance for the use of the Wikidata ontology and its potential improvement, and can give insight into the content of (domain-specific portions of) the Wikidata knowledge graph.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Giorgos Stoilos submitted on 28/Nov/2023
Suggestion:
Minor Revision
Review Comment:

The authors have taken into account all my major comments. I believe that the readability of the paper has been increased significantly. The work presented is quite useful for KG summarisation and exploration and monitoring the evolution of a KG (as done with different wikidata versions). Still there might be some more opportunities for small improvements here and there.

In Figure 1, the threshold Tp and threshold Tdr boxes are meant to indicate (user) input to a processing block hence they shouldn't be in between the actual data processing boxes. It should be something like this:
Tp
'Build subgraphs with | 'return most used
instance of each ---> properties above Tp
selected class' for each subgraph'

The description of the section of relevant classes talks about distances ("if Tc is equal to 0 only the most instantiated class will be considered"), however, in Algorithm 1 the measure used is closeness and not distance, hence (in line 9) if Tc is 0 then all classes are selected and not only the most instantiated. Could you please check?

Algorithm 1:

It is better to use some other letter (e.g., D) for ObjectClasses since R is usually used for relations/properties.

The DL notation in line 16 is not correct. It should be C \and \exists P.R (or if take the previous comment into consideration it should be C \and \exists P.D)

'For instance, you can create a ShEx shape stating that an instance of a book must have...'. It would be good to actually provide the ShEx shape.

'However, unlike OWL property restrictions on classes, they do not limit the applicable classes'. Since OWL has open-word semantics, I am not sure we can claim that OWL domain/range restrictions necessarily restrict the classes to be used (unless also combined with some disjointness axioms). Maybe this sentence needs to be revised or dropped.

Properties for this type: 'the appropriate range(s) to be paired with that specific type () cannot be specified'. I think the word 'cannot' is a bit too strong. Couldn't they be specified somehow, but perhaps Wikidata is not providing the mechanism to do so?

I appreciate the authors adding Figure 3 for type of wikidata property. Still however, from an intuitive point of view, I am not getting the idea behind Wikidata type of properties and how this is different from domain/ranges. It is stated that the property 'facet of' connects the metaclass and its topic but then in the given example chessgames.com player ID is a property. So are we talking about (meta)classes or properties. Clearly this is a Wikidata thing and not the authors, but it would be good to understand since it is used in the paper.

'activity in progress'. I feel it is better to call it 'work in progress'

Page 11: The axioms in 1 -> The axioms in Listing 1

Page 15: As in figure 7, IDs properties -> ID properties (no prular on ID ?)

Review #2
Anonymous submitted on 12/Dec/2023
Suggestion:
Accept
Review Comment:

AFTER REVISION
--------------

I'd like to thank the authors for taking the time to improve their submission as per the comments they received.

All the remarks I sent are now covered, therefore I change my score to "Accept".

Minor typo §5.1 p8 L11: "10-07-2023 (july 2022 version)" -> "10-07-2023 (july 2023 version)"

Thanks!