Review Comment:
I thank the authors for addressing my comments on the earlier version
of this paper, in particularly the refactored code and the separation
of “novalue” and “somevalue”. However, there also lies my last
remaining gripe with the paper:
The authors claim that “the RDF representation of Wikidata uses blank
nodes for both unknown and non-existing values” (p8, l10f). Indeed,
the RDF representation does not use blank nodes for non-existing
values. Only unknown (“somevalue”) values are turned into blank nodes,
non-existing (“novalues”) values do not create additional values, but
are represented by making the statement node (and, for asserted
claims, the entity itself) an instance of a class “wdno:P???”, where
the “???” corresponds to the relevant property id, cf. the RDF Dump
Format specification [0]. This needs to be clarified (there are
further references to “novalue blank node[s]” on p9, l16; on p15,
l31f; and on p16, l41). More egregiously, Listing 7 does not actually
show a “novalue” claim, but rather a “somevalue” claim, so another
example is needed (the modelling shown there is also not, as claimed,
incorrect, since it indeed uses an “unknown value”).
Moreover, “Since these two methods should be employed alternatively,
this co-occurrence on the same properties might indicate that
annotators are using these two types of blank nodes imprecisely” (p15,
l31f): For at least some of the properties (such as “publisher”), it
might be legitimate to state that some work does not have a value for
this (i.e., a work that was not published does not have a publisher
and would warrant a “novalue” claim), whereas for other works the
publisher is merely not known (which could warrant a “somevalue”
claim). On the other hand, for, e.g., “creator”, it is hard to imagine
a situation where a “novalue” might be legitimate. Maybe some more
differentiation is required here?
Lastly, “Even though Wikidata focus on established knowledge
(community consensus), rather than conjectural or controversial
information […]” (p17, l25f): ultimately, Wikidata is a secondary
database, not with the goal to encode all the true facts in the world,
but rather to collect and reference the facts claimed elsewhere
[2]. This is, of course, not to say that uncertainty of claims need
not be represented in Wikidata (on the contrary), but might provide
some limited insight into why Wikidata has comparatively low WLS
claims: I would imagine that finding a reference that some claim is,
e.g., disputed, is rather more difficult than just finding references
for plain facts.
All in all, I am quite happy with the improvements made to the paper
and am confident that the remaining issues can be successfully
addressed in a minor revision.
Minor comments:
- p3, l17: “state of the art (2)” ~> “state of the art (section 2)”
- p4, l3: “have been imported into”: more accurately, have been linked
to the RKD data set; the original description may have been imported
from elsewhere.
- p4, l20: “indicate type” ~> “indicate the type”
- p5, l51: footnote 7 is broken, the link should go to
http://www.wikidata.org/wiki/Help:Statements instead. Several
further footnotes are also affected, see below.
- p7, l20: remove “http://www.wikidata.org/entity/Property_talk:P2241”
- p7, l49: footnote 14 is broken (“/entity/” ~> “/wiki/”)
- p8, l3: around the reference to footnote 17, a closing parenthesis
is missing
- p8, l46: a better footnote 17 might be
https://www.wikidata.org/wiki/Q86719099, or either of the two
values for the “described at URL” property
- p10, l48: footnote 26 is broken (“/entity/” ~> “/wiki/”)
- p17, l48: footnote 51 is broken (“/entity/” ~> “/wiki/”)
- p18, l51; “and are not” ~> “not being”
- p19, l44: “assigned a accepted” ~> “assigned an accepted”
- p20, l10: “represent the unknown value” ~> “represent the
non-existing value”
[0] https://www.mediawiki.org/wiki/Wikibase/Indexing/RDF_Dump_Format#Novalue
[1] https://www.wikidata.org/wiki/Q11981626
[2] https://www.wikidata.org/wiki/Wikidata:Verifiability
|