On assessing weaker logical status claims in Wikidata cultural heritage records

Tracking #: 3374-4588

Authors: 
Alessio Di Pasquale
Valentina Pasqual
Francesca Tomasi
Fabio Vitali

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Abstract: 
This work presents an analysis of the use of different representation methods in Wikidata to encode information with weaker logical status (WLS, e.g. uncertain information, competing hypothesis, temporally evolving information, etc.). The study examines four main approaches: non-asserted statements, ranked statements, null-valued objects, and statements qualified with properties P5102 (reason of statement), P1480 (sourcing circumstances) and P2241 (reason for deprecated rank). We analyse their prevalence, success, and clarity in Wikidata. The analysis is performed over cultural heritage artefacts stored in Wikidata divided in three subsets (i.e. visual heritage, textual heritage and audio-visual heritage) and compared with astronomical data (stars and galaxies entities). Our findings indicate that (1) the representation of weaker logical status information is limited, with only a small proportion of items reporting such information, (2) the representation of WLS varies significantly between the two datasets, and (3) precise assessment of WLS statements is made complicated by the ambiguities and overlappings between WLS and non-WS claims allowed by the chosen representaions. Finally, we list a few proposals to simplify and standardize the representation of this type of information in Wikidata, with the hope of increasing its accuracy and richness.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Maximilian Marx submitted on 22/Mar/2023
Suggestion:
Major Revision
Review Comment:

The article introduces the concept of “weaker logical status” claims –
claims that are neither true nor false, but rather true in some
context, such as claims that are based on uncertain information or on
competing hypotheses, or claims that have changed over time. It
identifies several methods of representing such information on
Wikidata and investigates their usage on two subgraphs of Wikidata,
one being cultural heritage data and the other being a comparatively
sized sample of astronomical data. Lastly, the authors suggest several
improvements to the current state.

The article appears to be the first such study, and, indeed, one of
the first to introduce the concept of a “weaker logical
status”. However, in its current state, I cannot recommend the
article for acceptance.

The article identifies four approaches used to represent “weaker
logical status” claims in Wikidata. The first such approach is a
distinction between “asserted” and “non-asserted statements”. Out of
the four approaches, this is the only one not defined on the Wikibase
data model, but rather on the RDF representation: a statement is
considered “asserted” if it has a corresponding “truthy” triple
[1]. However, whether or not such a triple is included in the RDF
representation is determined solely by the statements rank and absence
of higher-ranked statements – it is not a concious choice made by
editors. As ranking is discussed as the second approach, this feels
somewhat redundant, in particular since every deprecated statement is
also automatically non-asserted. I would suggest to rather distinguish
between deprecated and non-deprecated non-best-rank statements (i.e.,
normal-ranked statements where preferred statements exists), as is,
indeed, done in Figure 2.

The third approach investigated is qualified statements. The authors
mention that, e.g., P2241 does not have a list of recommended terms on
its discussion page – while true, it should be noted that allowed
values for P2241 qualifiers are still restricted via the “value type”
property constraint, and a list can be retrieved, e.g., via a SPARQL
query [2]. Furthermore, a “one-of” constraint also exists, that, while
deprecated, is still used by the UI to provide suggestions when
editing.

The last approach discussed consideres “null values”. From the
description, I take it to mean the “unknown value” (“somevalue”) of
the Wikibase data model (since these correspond to blank nodes in the
RDF representation) [3]. However, looking at the code, it seems the
analysis rather counts “no value” (“novalue”) statements [4]. While
both could be labeled “null values”, they are not interchangeable:
“unknown value” signifies that a value exists, but it is not known,
whereas “no value” asserts the absence of a value - indeed, this is
claimed as a problem (“For instance, null values are used in some
predicates to represent values that cannot exist, e.g. when signaling
the start (P155: follows + null value) […] in sequences”, “The
subtlety in the semantic differences between providing no value and
providing a null value for a property of a wikidata item, as well as
their other types of applications makes the use of null values
particularly complicated and ambiguous.”). Clearly, the two kinds of
“null values” need to be distinguished in the analysis; in any case,
the description should match what is actually being counted.

As the count of “weaker logical status” claims in Wikidata seems to be
lower than what the authors expected, they compare the relative
portions of the cultural heritage datasets with those of the RKD
images collection, where the portion of “weaker logical status” data
is significantly higher. Since neither of the datasets is contained in
the other (the authors state that about 30000 images from the RKD set
are also present in Wikidata), I wonder whether this might be due to
peculiarities in the datasets (it is also not entirely clear to me
what the RKD dataset contains) – for example, it could simply be that
the RKD dataset contains proportionally more works with, e.g., unknown
creation dates because the proportion of more recent works in Wikidata
is higher (with the assumption being that it's more likely for recent
works to have a known time of creation).

I appreciate that the code used to perform the analysis and the
datasets are available. However, many of the scripts have hardcoded
paths to the data files in them (and thus cannot be run easily without
modifications) – I would suggest to always use relative paths (and to
document what those are). I also suggest providing a pip lockfile
(“requirements.txt”), or at least documenting the required versions of
dependencies used.

Some minor comments:

p3, l3: of the named Knowledge Graphs, only Wikidata seems to be a
“collaborative public platform[s]”.

p3, l16: I am not too familiar with the CIDOC CRM, but how is
“crmn:E13_AttributeAssignment” an n-ary relation? It is my
understanding that it is a class used for reified n-ary relations.

p3, l27: “3% of the total of its visual
artworks”: 3% of the visual artworks in Wikidata, or 3% of the visual
artworks in the RKD?

p3, l36ff: “[18]” is about representing the Wikibase data model in RDF
– while this requires some form of reification, this is independent of
the “logical status” of the claims.

p5, l12: “Statements, independently of rank, […]”: claims can be
decorated, a statement encompasses a claim, references, a rank, and
qualifiers. Furthermore, there are many more qualifiers beyond the
four mentioned [5].

p5, l40: “statements can be associated with a blank node”: the blank
node is an implementation detail of the RDF dump export – indeed the
SPARQL endpoint uses skolemised nodes instead [6]. In terms of the
data model, the special value “unknown value” is used (which is not to
be confused with the other special value “no value”).

p6, l12: “follows + null value”: indeed, this is the other kind of
special value, “no value” – note that, in the RDF representation, this
does not correspond to any node, rather, the statement node becomes
the subject of an “rdf:type wdno:P…” triple.

p7, l31ff: I would suggest moving this paragraph in front of the
description of the individual datasets, as it explains how the numbers
of JSON files come to be (also, since the numbers of entities are
usually not divisible by 50, “exactly 50” should likely be “at most
50”).

p8, l21: “avoid assessing a claim”: how does deprecating a statement
avoid an assessment?

p8, l26: “A datasets” looks confusing. Maybe call the datasets “ANs”
and “ANg”, and refer to both as the “AN datasets”?

p10, l15: “Q5727902: circa qualifier”: circa is the qualifier value
for the qualifier, not the qualifier.

p10, l42: “This is probably the only true WLS use of null-valued
statements”: ironically, this is a modelling error, since a novalue is
used, although it should be a somevalue.

p10, l51: “shifted from the public domain to copyrighted”: rather
shifted from copyrighted into the public domain?

p11, l23ff: what is the distinction between uncertainty qualifiers
(such as “disputed”) and cautioning qualifiers (“attribution”)? This
seems arbitrary to me.

p13, l6f: “if an artwork A was supposedly moved […]”: (i) “artwork A”
is particularly confusing with “A” also being used for the “A
dataset”; (ii) why must both statements be ranked as deprecated? Since
Wikidata is not a primary source, but rather a secondary database,
unless the claim that the artwork was moved is stated elsewhere, the
claim should not be in Wikidata at all [0]. If there were such a
source, I would expect a normal-rank statement, possibly with a
“disputed” qualifier value?

p13, l32: “Provide a list of suggested values for P2241 and P7452”:
Such lists do exist, although not directly on the discussion page, but
in the form of “one-of” property constraints.

p13, l39f: “distinguish […] between […] WLS […] and non-WLS uses”:
This distinction is already present in the form of the
somevalue/novalue special values [3].

p14, l5f: “can be accessed in the Github folder of the project”:
where? I have not been able to locate it.

While the structure of the article is easy to follow, it contains a
number of spelling mistakes and inconsistencies, and some sentences
are hard to follow. A (likely incomplete) list follows; in particular,
the authors should standardise on British vs. American spelling (e.g.,
both “analysed” and “analyzed” are used throughout the article), on
whether or not to use the Oxford comma, and consistently capitalise
“Wikidata”.

p1, l41: “coming by different and disagreeing sources” ~> “coming from different and disagreeing sources”
p2, l1: “enunciates” ~> “statements”
p2, l11: “limits” ~> “limit”
p2, l43: “(2)” ~> “section 2”
p2, l46 “reserach objective” ~> “research objectives”
p3, l12: “manage” ~> “manages”
p3, l13: “models[5]” ~> “models [5]”
p3, l14: “CRM[4]” ~> “CRM [4]”
p3, l17: “Europeana [10],” ~> “Europeana [10]”
p3, l18: EDM needs to be defined
p3, l30: “data model” ~> “data models”
p3, l31: “(dumping)[14]” ~> “(dumping) [14]”
p3, l40: “according multiple” ~> “according to multiple”
p3, l40: “[19] survey” ~> “Piscopo and Simperl [19] survey”
p3, l41: “categorizes” ~> “categorize”
p3, l51: “See also the list […] can be found at” ~> “The list […] can be found at”
p4, l23: “vy” ~> “by”
p5, l29: “P1502” ~> “P5102”
p6, l35: “WLG” ~> “WLS”?
p7, l8ff: “json” ~> “JSON”
p8, l18: “Av” ~> “As”?
p8, l25: “q:P2241” ~> “P2241 qualifier”
p8, l44: “seem” ~> “seems”
p9, l38: “P155:follows” and “P155:followed by” ~> “P155: follows” and “P156: followed by”
p9, l38: “by In” ~> “by. In”
p9, l40: “alternative” ~> “alternatives”
p9, l42: “asserted)”: there is no opening parenthesis
p9, l42ff: three colons in a row, please rewrite this sentence.
p9, l45: “Ag)”: there is no opening parenthesis
p10, l10: “by,P162” ~> “by, P162”
p11, l36: “seems doing” ~> “seems to be doing”
p11, l43: “a particular attention” ~> “particular attention”
p11, l42: “e.g. authorship” ~> “e.g., authorship”
p13, l20: “WLG” again
p14, l15: “would assigned” ~> “would be assigned”
p14, l41: “a overabundance” ~> “an overabundance”
p14, l42: “seem to a large” ~> “seems to be a large”
p15, l4: “loose” ~> “lose”

[0] https://www.wikidata.org/wiki/Help:Statements#Add_only_verifiable_inform...
[1] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Truthy_...
[2] https://w.wiki/6Tpt
[3] https://www.wikidata.org/wiki/Help:Statements#Values
[4] https://github.com/alessiodipasquale/Wikidata_WLS/blob/main/countBlank.p...
[5] https://w.wiki/6TrP
[6] https://www.mediawiki.org/wiki/Wikidata_Query_Service/Blank_Node_Skolemi...

Review #2
By Michael Piotrowski submitted on 12/Aug/2023
Suggestion:
Reject
Review Comment:

The topic of the paper is very interesting and relevant. In fact, the authors could be said to be breaking new ground with their study: there is widespread consensus that there is a need to make uncertainty explicit (the authors introduce the term “weaker logical status” as a generic term to encompass the wide variety of qualified statements; from a technical point of view, this seems like a good idea, although I’m not quite sure whether it is good idea from a theoretical perspective and whether it’s the best designation), but it is unclear how facilities for expressing uncertainty are used when they are made available.

The authors present the results of a case study comparing cultural heritage records with astronomy records in Wikidata. The outcomes of this study are very interesting and raise a number of intriguing practical and theoretical questions.

Apart from some passages and some typos, the paper is generally well-written and easy to follow if one has a basic understanding of Wikidata; the raw P and Q numbers are however difficult to follow if one doesn’t know them by heart. The paper closes with a list of suggestions to make the expression of “weaker logical status” easier and more uniform.

Despite the importance of the work, I recommend rejecting this paper, because it (IMHO) doesn’t meet the requirements for a “full paper” in a journal. It is definitely original, and the results are interesting, but as its main part is a report on a case study using a sample of a snapshot of Wikidata, the results are inherently limited in their significance beyond the case study. While the suggestions are certainly useful, they lack a theoretical basis and are rather a starting point for further discussion rather then definitive recommendations.

I’d love to see this paper presented and discussed at a conference. This would certainly help to develop the theoretical aspects on the basis of the results of the case study. For me, two theoretical questions stand out in particular:

1. To what extent is it useful to group the different phenomena covered by “weaker logical status” together? It seems not unlikely to me that one of the causes for both the lack and the diversity of annotations is that very different phenomena (like uncertain and temporally evolving information) are thrown together because they are *technically* represented in the same way, and contributors thus have difficulty determining which case of “weaker logical status” they have at hand. Yes, there are deep epistemological questions, but the paper actually shows that many of the problems are due to the fact that there is no solid theoretical basis.
2. I find the choice of astronomy as a point of comparison very interesting. The authors don’t explain this choice and seem to consider astronomy simply as an example of a “hard science.” I assume the choice was in part pragmatic: astronomy deals with unique objects, like cultural heritage, for which there are corresponding records in Wikidata. What is more interesting, however, is that astronomy is not an experimental, but an “interpretive and historical science”: its research objects are not directly accessible and in most cases long gone. Thus, astronomy actually shares more with history than it may at first seem (the same goes for geology, see Comet, Paul A. (1996) Geological reasoning: Geology as an interpretive and historical science: Discussion, Geological Society of America Bulletin 108(11): 1508–1510. https://doi.org/10.1130/0016-7606(1996)108%3C1508:GRGAAI%3E2.3.CO;2). The comparison is thus much more interesting than just “humanities vs. natural sciences.” In some respects, the differences in annotation found by the authors are thus perhaps due to organizational differences, e.g., there is a commonly agreed “workflow” for updating the status of observations. In any case, I thing it would be very valuable to explore the epistemological commonalities to gain deeper insights into the various types of “weaker logical status.”

Since I recommend to reject the paper, I won’t go into details with respect to minor issues. There are a number of typos (e.g., trough, reserach) and some passages that should be revised, but they would be revised anyway in a futur version.

Again, a very interesting topic, and my recommendation doesn’t concern the quality of the work, but rather its current state, which is not advanced enough to warrant a journal publication.

Review #3
By Daniel Hernandez submitted on 23/Aug/2023
Suggestion:
Major Revision
Review Comment:

This paper studies how Wikidata encodes statements that are not clearly true because they contain uncertain information, have incomplete information, involve competing hypothesis, or represent temporal evolving information, among other forms of information they group under the label of information with weaker logical status (WLS).

Originality

The problem of giving semantics to multiple identifiers has been discussed, and some ideas have been proposed. Some papers that are missing in the state-of-the-art section are:

1. P. F. Patel-Schneider, Contextualization via qualifiers, in: Contextualized Knowledge Graphs @ ISWC 2018, 2018.

2. A. Zimmermann, N. Lopes, A. Polleres and U. Straccia, A general framework for representing, reasoning and querying with annotated semantic web data, Web Semantics: Science, Services and Agents on the World Wide Web 11 (2012), 72–95.

Like these two papers, the authors of this paper address the problem of reasoning with qualifiers. However, this paper does not provide a semantics for them. I cannot evaluate the novelty of the proposals because they are not formalized.

Relevance

This is relevant because information in the world is incomplete and is many times subject to competing hypothesis and beliefs.

Quality

Major issues

1. The paper mentions several manifestations of what is considered information with weak logical status. However, it appears that this notion is informal. I understood that for the authors, information with a strong logical status is information consisting only of ground atoms, as p(a, b). However, these manifestations of weak logical status require more expressive logical languages.

Unknown null values in Wikidata are intended to represent existing values that are missing. For example, the book has an author, but is missing. In the RDF representation, this type of null value is represented with blank nodes:

Daniel Hernández, Claudio Gutierrez, Aidan Hogan
Certain Answers for SPARQL with Blank Nodes

I don't consider this kind of incomplete data weak logical status, but a manifestation of incomplete information. Formulas ∃x p(a, x) and p(a, b) ∨ p(a, c) are both logical formulas representing incomplete information because allow for multiple models (logical interpretations).

On the other hand, temporal data does not necessary implies incomplete data, but data that is valid in a given context. Gabriel Boric is the president of Chile, but it was not some years ago. There are several proposals for logics that allow for reasoning with contexts, for example:

S. Klarman and V. Gutiérrez-Basulto, Two-Dimensional Description Logics for Context-Based Semantic Interoperability., in: AAAI, 2011.

Finally, encoding competing hypothesis and also goes beyond ground atoms.

On the other hand, the paper describes several ways in which information with weaker logical status is encoded: nulls, ranks, asserted vs non-asserted statements, and qualifiers. I would expect that incomplete information in the form of existential variables is modeled with blank nodes, and that this is more common for works of art than for stars and galaxies. The authors describe that this is more common in the CHav dataset, for example, to indicate the original language of a film or TV show there exists, but it is unknown. The authors consider that having datasets using different methods is something negative, and call this a poor overlap (see page 9, lines 40 to 45). However, to me, it seems that this poor overlap is needed because different datasets are encoding things that require different types of formulas to be expressed. For example, if we assert φ₁ := ∃x hasOriginalLanguage(a, x) and then the evidence shows that φ₂ := hasOriginalLanguage(a, Spanish), then this does not mean that φ is no longer true, but it is redundant. This case is different for the case when the first belief is ψ₁ := hasAuthor(a, b) and the second belief is ψ₂ := hasAuthor(a, c). In this case, the evidence can show that ψ₁ is no longer true, and should not be asserted.

Regarding the use of the existential null value for the attribute hasOriginalLanguage, I would expect that if a film is silent then a no value is used (e.g., with the case of the child property for Elizabeth I of England). The authors report some examples of incorrect use of the existential null values for cases where the no value should be used (page 10). I agree with that there are many cases where the existential null values are wrongly used, but this is not a problem of the existential null value but of the tools used to import knowledge bases to Wikidata.

Hence, I do not see clearly that the poor overlapping is a problem, but mostly a result of the different nature of the information in each dataset. The problem should be described more precisely. Currently, I am not convinced of the existence of a problem beyond mistakes introduced by tools or data editors.

2. The recommendation in Section 5 is so vague to be useful. For example, the first item recommends that the suggested values for properties P5102 and P7452 should be clearly differentiated and with no semantic overlap. This recommends sounds reasonable, but it is not clear that it is possible to provide a fixed list of values with non-overlapping semantics. Defining such a list may be a hard task. Moreover, relations usually overlap, and depending on our knowledge we may choose the more informative one. For example, given two persons x and y, we may know that they are relatives, x is the parent of y, or x is the ancestor of y. These three relations overlap, but we need the less informative relations ancestor or relative if we don't know the actual relationship between x and y. Furthermore, verifying the "semantics overlap" is complicated when most of these values have no formal semantics.

Some of these recommendations seem to be modifications to the widgets of the Wikidata Web editor interface; however, some datasets are loaded using automated tools.

Some recommendations suggest that values must not be saved if some quality checks are not satisfied. It is not clear how these quality checks should be implemented. I would expect a formal definition of the validity checks, for example providing SHACL constraints for them.

3. I am not convinced of the comprehensiveness of the proposal. This paper shows some cases in which some incorrect data is introduced, and proposes introducing some validity checks. However, Wikidata defines more than 9000 qualifiers. How can I know that the qualifiers described cover the qualifiers used for weaker logical status? How can we know that the issues described cover the most relevant issues? Given the huge number of qualifiers, it would not be difficult to find mistakes in the use of some of them and then propose a validation to avoid such a particular mistake. Notice that I am not saying that the work is not comprehensive, but that its comprehensiveness is not well described.

4. The use of Wikidata identifiers makes the text difficult to read. For example, I cannot remember what the predicate P2241 means without reading back to the place where it is explained. It would be better if the paper uses the English label "reason for deprecated rank" instead of the Wikidata identifiers. Moreover, sometimes these identifiers are written with different typography in this paper (e.g., page 13, lines 32 and 34).

5. The primary data model of Wikidata is the Wikibase data model. The translation from the Wikibase data model to the RDF dumps is done automatically. To my knowledge, all statements are translated to a reified statement, and I guess that the translation automatically chose which statements are also asserted. I miss the discussion on the policy that defines which statements are asserted in Section 3.1. Furthermore, most of the discussion in this paper is on the RDF dumps and not on the Wikibase model. I am not convinced that the discussion of the statements with weaker logic status should be done on the RDF dumps, but on the Wikibase model. The use of the RDF dump can lead to some semantic inaccuracies. In particular, the use of the "no value" used in conjunction with the "follows" qualifier to indicate the start of a sequence.

6. Most of the analysis in Section 4.2 compares the use of methods in the different datasets. They give numbers, but I do not see explanations of these numbers. For example, in page 8, lines 14 to 20, several numbers are given for the occurrence of non-asserted statements. I expect an explanation of what we can conclude from these numbers, or why these datasets have different numbers. In particular, to my knowledge, asserted statements are generated automatically from the Wikibase model. Hence, it should be explained what are the patterns that are generating non-asserted statement. My guess is that changing properties in astronomic data is very common, and this gives so many non-asserted statements. I feel that this difference between the datasets in considered as something negative, when it is just a consequence of the nature of the datasets.

7. The answers to the research questions are not clearly stated. For example, the answer to RQ1 in Section 4.3 is that the current state of WSL claims in Wikidata is poor. Why? Is it because only 1% of the claims provide this information, whereas this information for the RKD the number is 8.5%? What should be the expected number for the dataset? Is the higher number in the RKD because of a special emphasis on the curators of this dataset on WLS claims, or a particular property of the RKD dataset? This issue seems not clearly discussed in order to make such a strong statement. I am not saying that the WSL claims do not have room for improvement, but that I cannot follow the arguments for several statements made in this paper.

Minor issues

1. In page 3, line 3, several "public" knowledge graphs are mentioned. What is intended to mean with "public" and why the Google knowledge graph is considered public? The same page also mentions public platforms and open platforms, what is the difference between them?

2. A space is missing before a cite on page 3, line 13,

3. On page 3, line 14 an approach for adding metadata via n-ary relations is mentioned, and an example is provided. However, the example is not informative. A reader does not have to know what crm:E13_AttributeAssignment means, and it should not be asked to go to the cite to understand what this paragraph intends to say. Instead, the authors should provide a short explanation.

4. In page 3, line 26: What "traditional relational query patterns mean"? Are they conjunctive queries?

5. On page 4, line 18, the reader who does not know the Wikidata model may found difficult to understand that is intended by using a different prefix. It would be better to provide an example first. I think that the example in Section 2.1 should include the whole information on the triple. The relationship with the object is not included in the reified version of the statement.

6, Page 4, line 23: vy -> by.

7. Page 7, line 27: It is said "one of the methods" without specifying which method. Maybe it is intended to say "each method", since Table 1 shows all of them.

8. Page 8, line 18: Av -> As.

Conclusion

Although the topic is relevant, I would not recommend the paper in its current form, and I consider it requires a major revision.