Handling Wikidata qualifiers in Reasoning

Tracking #: 3483-4697

Authors: 
Sahar Aljalbout
Gilles Falquet
Didier Buchs

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Abstract: 
Wikidata is a knowledge graph increasingly adopted by many communities for diverse applications. Wikidata statements are annotated with qualifier-value pairs that are used to depict information, such as the validity context of the statement, its causality, provenances, etc. Handling the qualifiers in reasoning is a challenging problem. When defining inference rules (in particular, rules on ontological properties (x subclass of y, z instance of x, etc.)), one must consider the qualifiers, as most of them participate in the semantics of the statements. This poses a complex problem because a) there is a massive number of qualifiers, and b) the qualifiers of the inferred statement are often a combination of the qualifiers in the rule condition. In this work, we propose to address this problem by a) defining a categorization of the qualifiers b) formalizing the Wikidata model with a many-sorted logical language; the sorts of this language are the qualifier categories. We couple this logic with an algebraic specification that provides a means for effectively handling qualifiers in inference rules. Using Wikidata ontological properties, we show how to use the MSL and specification to reason on qualifiers. Finally, we discuss the methodology for practically implementing the work and present a prototype implementation. The work can be naturally extended, thanks to the extensibility of the many-sorted algebraic specification, to cover more qualifiers in the specification, such as uncertain time, recurring events, geographic locations, and others.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Maximilian Marx submitted on 21/Jul/2023
Suggestion:
Minor Revision
Review Comment:

I appreciate the revised article, which addresses most of my previous
concerns. Among the points still open, the use of date formats
throughout the article is still widely inconsistent with at least
three different formats used: “DD Month YYYY” (e.g., p3, l6),
“YYYY-MM-DD” (e.g., p4, l51), and “DD/MM/YYYY” (e.g., p8, l36). This
should be standardised to at most two, better yet one format.

In Definition 3, t_1, …, t_k are the terms corresponding to the
qualifier sorts s_1, …, s_n. Confusingly, the qualifier–value pairs here
are q_1: v_1, …, q_n: v_n, and the k is not related to the number of sorts
(which is fixed as part of \Sigma). I suggest swapping the roles of k and n
here. Furthermore, the t_i do not only depend on the _values_ of the
qualifiers, but also on the qualifiers themselves: what seems to be
missing is that they correspond exactly to the qualifier–value pairs
in the statement, grouped by sort, since a statement surely should
have a unique many-sorted representation.

I'm also not convinced of the handling of noValue and someValue in
qualifiers (using a different atoms for statement objects work, but
effectively excludes such statements from reasoning). Surely someValue
cannot just be represented by a single constant: E.g., Lucius Salvius
Otho (Q432416) has two spouse statements and three child
statements. The child statements are all qualified with mother, where
the first two use someValue – treating someValue as a single constant
would make these two children siblings, while all we know is that they
were half-siblings. In the Wikidata dumps, therefore, someValue is
represented as an RDF blank node (on the query service, they are
skolemised for performance reasons, still, all instances of them are
treated as distinct objects). For noValue, Example 3 shows how to
extend the specification, but leaves out the definition of, e.g.,
addEndCause(se1, noValue). I would expect this to be noValue again,
since a noValue statement assures that _there is no cause_. However,
this seems to be incompatible with the existing axioms that
getEndCause(addEndCause({se1}, c1)) = add(e1, getEndCause(c1)) —
firstly, getEndCause(noValue) is not defined either, and would need to
be the empty set, but then getEndCause(addEndCause({se1}, noValue))
would need to be {se1} – so noValue would behave like emptyCause,
which doesn't match it's semantics.

I also have some doubts if inverseCause is always a function: e.g.,
“death” has opposites “birth” and “procreation” (however, currently
there seems to be no usage of a property with two opposites as a
qualifier value for a symmetric property).

Hino Yasuko is probably not an ideal example in (7) to illustrate why
sequence qualifiers can't be kept, since the statement here is in
violation of an allowed qualifier constraint (spouse does not allow
sequence qualifiers). Perhaps a better example can be found?

For the sort value representation I suggest writing “JSON documents”
(or “JSON objects”) instead of “JSON strings”, since you are not
referring to string literals in the JSON grammar.

I noticed that the compiler generates string replacements on IRIs to
obtain, e.g., the statement property from a claim. These mappings are
actually included in the dump, and, e.g., `BIND(IRI(REPLACE(STR(?P),
"/entity/", "/prop/")) AS ?P_p)` could be replaced with a `?p
wikibase:claim ?P_p` triple pattern.

For the rule evaluation, I am not quite sure how the “inferred graph”
relates to the original (input) graph. Surely the input graph and the
new inferences must be merged somehow, otherwise potential matches for
recursive rules (such as the subclass-of rule) would remain
unsatisfied – how is this done (unfortunately, the rule engine itself
does not seem to be part of the provided code)? I also have some
doubts about how this could scale to something as large as Wikidata,
as the architecture really seems to enforce naive (vs. semi-naive)
evaluation here.

My final concern is with the very loose coupling of the CASL
specification with the rest of the pipeline. Essentially, it can be
left out and the rest of the pipeline would continue to work as
described. In particular, there doesn't seem to be any way of either
(i) deriving the JSON representation from the CASL specification, (ii)
deriving the JavaScript implementations from the CASL specification,
(iii) deriving actual unit tests from the CASL specification (this
might be covered under “skeletons for generating unit tests”, but I
would expect that one could directly, automatically, generate
property-based tets for, e.g., fast-check or JSVerify from the CASL
axioms), nor (iv) deriving the sort values for statements using the
CASL specification (at some point, the sort values for the inferred
statements also need to be turned back into qualifier–value pairs,
which I also couldn't find in the current implementation).

Overall, I am confident that these concerns will be addressed in a
minor revision.

Lastly, some minor remarks:
p4, l1: earliest time, latest time: actually called “earliest date”, “latest date”
p5, l3: missing space after footnote
p6, l32: then ~> than
p7, l20: drop the space before the comma
p7, l29: “in order,” ~> “in order”
p7, l35, and various other places: “is comprised of” ~> “comprises”
p8, l15: “)[6]” ~> “) [6]”
p8, l17: another superfluous space before a comma
Definition 2: datavalue should be a subsort of value
Example 1: since Dewhurst (should also be spelled with a capital “D”) died in August 1991, the validity interval is certainly incorrect, and, in any case, not the one from statement (1).
p9, l22: “’noValue’”, “’someValue’” ~> “‘noValue’”, “‘someValue’”
p9, l32: “earliest start date”: there is no such qualifier
p12, l35: “For instance, in the Wikidata dump, we are working on, while […]”: this sentence is very hard to parse. I suggest rewording it, or, at the very least, removing the comma after dump.
p13, l21, and various other places: “inspired from” ~> “inspired by”
p13, l24: “annotations qualifiers” ~> “annotation qualifiers”
p13, l48: this footnote is essentially impossible to understand without looking at the appendix (in particular, “V2” should be “V_2”. It's also much more specific than it needs to be, since the same should be true if V_1 is not temporally constrained.
p15, l20: the example shown is not valid JSON, “wd:Q796919” needs to be quoted.

Review #2
By Daniel Hernandez submitted on 24/Jul/2023
Suggestion:
Major Revision
Review Comment:

Summary

I thank the authors for answering my questions. In this paper, the authors propose using a many-sort logic to formalize the semantics of qualifiers in Wikidata. In Section 1.1, exemplifies rules that apply to Wikidata statements without considering qualifiers, and then, in Section 1.2, they exemplify how these rules can be modified to consider also information provided in statement qualifiers. Since Wikidata consists of so many qualifiers (more than 9000 according to the authors), they propose to organize them in five categories, so qualifiers in the same category can be treated similarly, are in separate sorts. They describe some inference rules over these five sorts, and some operations that can be applied for each sort. For example, the intersection between time ranges can be done for the sort of statement context.

Originality

It is not easy for me to identify the main contribution of this paper. Thus, I cannot say this paper is original. The general idea of having multi-sort relations to represent the semantics of statement and qualifiers does not seem novel. For instance, tracking provenance from rules is well known. It appears that the provenance sort is not needed to be added to the paper. Similarly, the intersection used to combine the V sort is not new, since the conjunction of literals in a rule implies intersection of contexts.

Relevance

It is relevant because reasoning over qualifiers can improve our capacities to make inferences over Wikidata, or to validate current data.

Quality

After the revision, I still see some major issues:

Issue 1: No translation from qualifiers to sorts

Definitions in Section 3 improved regarding the previous submission. Now it is clear that a graph G is a set of statements of the form (s,p,v, {q₁:v₁,...,qₙ:vₙ}), which are interpreted as terms of the form st(s,p,v, t₁,...,tₖ}) where for each i ∈ {1,...,k}, tᵢ belongs to a sort Sᵢ. However, I do not see the specification for this translation for the concrete case of Wikidata, which includes so many qualifiers. The specification sounds so abstract if we want to solve concrete problems that require inference. I would have been satisfied even if such a translation had been described for a small number of relevant qualifiers.

Issue 2: SomeValue

Even if this function from statements to terms is given, and we only consider terms, it is unclear how inference is performed because the set defined by sorts is not fully specified. To see this, let us consider only the sort V, but restricted to the temporal qualifiers "start time" and "end time" in Table 1. And the sort V contains then only values defined as follows:

V :=
emptyValidity() |
timeValidity(time) |
setTime(V, time) |
inter(V, V)

where time denotes the set including the ordered set of times and additional values to encode values as NoValue and SomeValue that can be given in qualifiers, as is stated in lines 3-4 and 13-14, page 9. Since the interpretation function from statements to terms is not given, I wonder how NoValue and SomeValue are translated. I expect that the statements

S₁ := (a, p, b, {start time: 2022, end time: NoValue}),
S₂ := (b, p, c, {start time: 2021, end time: SomeValue}),

are respectively interpreted as the terms

T₁ := (a, p, b, I₁),
T₂ := (b, p, c, I₂),

where

I₁ := interval(2022, NoValue),
I₂ := interval(2021, SomeValue).

Doing so I mean that (a, p, b) is valid at every point of time since 2022, ant (b, p, c) is valid from 2021 to some value which exists, but it is unknown. From this semantics for terms T₁ and T₁ we can conclude that intervals I₁ and I₂ are not mutually contained. That is, includes(I₁, I₂) is false because interval I₁ includes all points of time since 2022, whereas interval I₂ does not, since there exists an unknown point of time t such that times after t are not included in I₂.

Values NoValue and SomeValue are not provided in Table 1, rules in Appendix A, nor code in Appendix B. Instead, the value undefined is presented in Appendix A and Appendix B. Then, I can imagine that statements S₁ and S₂ are codified as the following terms

T₃: (a, p, b, I₃),
T₄: (b, p, c, I₄),

where

I₃ := interval(2022, undefined),
I₄ := interval(2021, undefined).

However, in lines 37-39, page 9, it is said that undefined means it is valid everywhere. Under this "everywhere" semantics for "undefined" interval I₁ is included on interval I₂.

It is not clear that the proposed method supports the SomeValue semantics, as it is claimed in page 9. Furthermore, consider the following two time intervals:

I₅ := interval(SomeValue, 2022),
I₆ := interval(2010, SomeValue).

What is the expected intersection of these two time contexts? We cannot know. It can be empty, or it can contain some time. Hence, SomeValue introduces non-determinism which is not considered in this manuscript, and has been studied largely in the literature of relational databases with existential null values.

Section 3.2 says that SomeValue and NoValue are translated to terms with predicates sno and ssome. However, no rules are provided for these terms.

In the previous submission, I assumed a more simple semantics for contexts where restrictions can always be determined. By introducing SomeValue, this leads to problems that, I think, are not properly considered. I would suggest not introducing SomeValue to the reasoning. Otherwise, you should explain how you will handle these issues.

Issue 3: Multi-sort logic

The proposed method uses many-sorted logics to define representations of statements. There are two cites regarding the notion of many-sort: [4] (line 8, page 7) and [6] (line 14 page 8).

In [4], sorts are determining the range of variables. Sorts can be Animal, Wolf, and Fox, where sorts Wolf and Fox are subsumed by sort Animal, and sorts Wolf and Fox are disjoint. Sorts are useful in logics because they allow to reduce the search space in inference. Indeed, if we know that x is a Wolf and y is an Animal, and z is a Fox, then variables x and y can be unified, but variables x and z cannot. In this manuscript, sorts do not subsume other sorts, so I do not understand why the formalism in this paper is related to the one in [4].

In [6], the notion of many-sort is used in several contexts (e.g., many-sorted set, many-sorted relation, and many-sorted function). I would have expected the operations intersection and union be defined as many-sorted operations that are defined over each sort. However, operation inter only appears in sort V, and operation union only appears in sorts C and P. Hence, it is not clear to me if this many-sort notion is really used.

Hence, I do not see a clear connection between the multi-sort logic and the proposal.

Issue 4: Comprehensiveness

The authors provide some rules, but their comprehensiveness is not shown.

From the theoretical perspective, I would expect to read a comparison with existing formalisms (e.g., [13]) regarding expressive power. I would expect that rules are defined with some rational. For example, some rules appear when extending rules that are called "ontological" in order to consider qualifiers. These rules follow a rational that is guided by the question: How a known set of rules can be extended to consider qualifiers? If this were the question, then I would be satisfied with the story. The rules are comprehensive because cover a well-known set of rules. However, if you consider additional rules, then it is not clear why these rules are described, and other possible rules are not. The set of rules is not comprehensive anymore.

I also miss a semantics for the data that can be used to infer the rules. Here, you are presenting rules and examples that we can intuitively follow to check if the rules are sensible.

From the practical perspective, I would expect to see an evaluation on how many inferences these rules allow, or the support and confidence of the proposed rules in Wikidata. This evaluation is missing in the paper.

I would have been happy with either a theoretical or a practical approach to show the comprehensiveness of the proposed rules.

Conclusion

Although the topic is relevant, I would not recommend the paper in its current form. The main issue I see is the lack of enough comprehensiveness. This could be addressed by either providing a theoretical evaluation, or by providing a practical evaluation that shows how these rules can be used for a concrete task.

Review #3
Anonymous submitted on 10/Aug/2023
Suggestion:
Major Revision
Review Comment:

The paper is concerned with qualified statements in the Wikidata knowledge graphs. The qualifiers add to typical knowledge graphs statements of the form (subject, relation, object) some additional information, such as time validity, provenance, etc. The contribution of the paper is some analysis/classification of qualifier types present in existing Wikidata. Then, the authors propose a formalisation of the qualified statements in many-sorted logic (MSL) and provide a set of rules to capture reasoning with the identified qualifier types. Finally, the authors provide an implementation of an MSL reasoner supporting their qualifiers.

I believe the paper is relevant to SWJ and the classification/analysis of qualifiers in Wikidata is an interesting contribution. At the same time, I believe the paper could be more systematic in addressing related work. Many of the presented qualifiers and their specific reasoning have been studied in the past. For example, knowledge graphs with temporal validity qualifiers have been considered here [https://www.inf.unibz.it/~calvanese/papers/guze-etal-AMCS-2019.pdf] and provenance and context qualifies were considered here [https://link.springer.com/article/10.1007/s41019-020-00118-0].

On the other hand, the authors should make a better effort in describing the system they implemented. What is its structure and underlying technologies?

Finally, I believe evaluation of the proposed methodlogy of dealing with qualifiers is largely missing in this paper. It remained unclear to me if the methodology is efficient in terms of speed, accurracy or worst-time complexity of reasoning. The authors mention in the future work a plan to conduct practical experiments on Wikidata sets. I believe this paper should already include such results in order to be a solid contribution, or some other convincing evaluation.