Psychiq and Wwwyzzerdd: Wikidata completion using Wikipedia

Tracking #: 3341-4555

Authors: 
Daniel Erenrich

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Tool/System Report
Abstract: 
Despite its size, Wikidata remains incomplete and inaccurate in many areas. Hundreds of thousands of articles on English Wikipedia have zero or limited meaningful structure on Wikidata. Much work has been done in the literature to partially or fully automate the process of completing knowledge graphs, but little of it has been practically applied to Wikidata. This paper presents two interconnected practical approaches to speeding up the Wikidata completion task. The first is Wwwyzzerdd, a browser extension that allows users to quickly import statements from Wikipedia to Wikidata. Wwwyzzerdd has been used to make over 100 thousand edits to Wikidata. The second is Psychiq, a new model for predicting instance and subclass statements based on English Wikipedia articles. Psychiq’s performance and characteristics make it well suited to solving a variety of problems for the Wikidata community.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 25/Feb/2023
Suggestion:
Major Revision
Review Comment:

The paper tries to address the problem of Wikidata incompleteness in Wikidata by providing a UI tool powered by a machine learning approach to identify instances of (P31) and subclass of (P279) relations in the Wikipedia articles and add the found facts to Wikidata. The dataset files, as well as source codes, are available. Despite the importance of incompleteness in Wikidata and the attractiveness of the approaches based on machine learning along with human supervision, this report has shortcomings in my opinion.

The Incompleteness of Wikidata has been raised in many quality studies. Researchers usually measure completeness based on specific types of schemata, or by comparing the instances (and relationships) with the current status of the dataset/KG. This paper's definition of incompleteness, on the other hand, is the lack of coverage of many Wikipedia articles in Wikidata.

It is not clear where this criterion came from. There is no (and the report does not state any) reason to assume that Wikidata should have full coverage of Wikipedia (especially English Wikipedia). Wikidita and Wikipedia are both projects of the Wikimedia Foundation and are based on community collaborations. One of the original goals of Wikidata was to produce structured content for Wikipedia and other Wikimedia projects. However, the report assumes that the role of Wikidata is to represent Wikipedia in a structured way. This is the main problem with the paper. Wikidata is different from DBPedia. It is one of Wikidata's policies that the facts in Wikidata must come from primary (first-hand) sources, and Wikipedia is not considered a primary source. It is a secondary source and that is why Wikipedia's articles need references.

I really appreciate the machine learning approach (and I am a big fan of automatic tools in KG populating), but any automated tool must be evaluated in terms of quality and improvements. It is claimed that the presented tool performs better than the previous works in finding P31 and P279 facts, however, the article lacks an evaluation/comparison between the represented machine learning tool and other mentioned automatic tools such as Wikidata browser extension or IntKB. There is also no track of quality checks based on Linked Data quality criteria. Another limitation is the lack of comparison with the other Relation Extraction methods such as NELL and DuttaMS. We also do not find out to which extent the tool helped complete Wikidata (and obviously reporting how many times the tool has been used since March 2022 is not enough).

There are also writing problems. I think the writing style is away from the level of journal papers, e.g., using LIKE THIS/THAT expressions several times. There is a similar problem with the Figures' captions.

Overall, I think despite the importance of the problem, the key challenge is using Wikipedia as a source to complete Wikidata and we cannot observe concrete evidence of the quality, importance, and impact of the tool. Specifically, I am eager to see how the tool performs better than other available bots and approaches in terms of quantity and quality. I would like to see also a similar machine-learning ontological extraction from primary sources. For example, how about using Wikipedia's article references for extracting facts about the corresponding item (maybe for the next step!)?

Review #2
By Fariz Darari submitted on 06/Mar/2023
Suggestion:
Minor Revision
Review Comment:

> Summary of paper

The paper describes two tools for completing data on Wikidata. The tools rely on Wikipedia. The first tool, Wwwyzzerdd, provides a UI-addon for adding Wikidata data from Wikipedia web interface. The second tool, Psychiq, is more of a prediction model for Wikidata type and subclass statements based on Wikipedia articles.

> Feedback of paper

>> Title and Abstract

- The name Wwwyzzerdd seems unusual. Just asking: Is there any specific reason behind the naming?

- "Hundreds of thousands of articles on English Wikipedia have zero or limited meaningful structure on Wikidata." -> How important are these articles/items? How often are these articles/items read by people, or used in apps?

- "Wwwyzzerdd has been used to make over 100 thousand edits to Wikidata." -> Any (more detailed) quantitative as well as quality analysis on the edits made?

- The abstract could be made clearer on the relationships between the two proposed tools as to how they could complement each other.

>> Introduction

- Typo: ".. widely used as a source of structured data[1]." -> ".. structured data [1]."

Also, please do check other places where similar issues as above appear.

- In general, the author of the paper is knowledgeable on the background and issues in Wikidata (based on the introduction section).

- "526,297 (54%) edits in the 24-hour period were made using QuickStatements which is a tool for bulk editing." -> There should be a comparison of the proposed tools to QuickStatements.

- The problem presented is well motivated.

- "Psychiq is a machine learning model that is integrated into the Wwwyzzerdd UI that suggests, on the basis of Wikipedia’s content, new statements to be added to Wikidata." -> It's a bit misleading as this might suggest that Psychiq is already able to add arbitrary statements, while at the moment it focuses still on type and subclass information.

- The caption and in-text narration of Fig. 1 could be further improved. The wording could be improved, the current form is misleading as it suggests there are two lines/curves, one for growth of number of WD items vs. number of active users.

>> Related Work

- At the end of related work, I would expect a discussion on the summary of missing gaps and requirements in order to motivate the needs of the proposed tools.

- Typo: ".. the Wikidata Distributed Game framework ^6" -> The footnote numbering should follow directly the 'framework' without a space.

>> Implementation

- "Wwwyzzerdd is a manifest V2 .." -> Could it be clarified a little bit on manifest V2?

- Typo: The use of a comma in connecting subsentences could help in improving understandability: "If the author is notable enough to have an English Wikipedia article their name in the book’s article will almost always be a link to the author’s article."

- There seems to be an issue wrt. the scalability of Wwwyzzerdd, in particular since the author has mentioned the limited number of edits by humans. Any thoughts on this?

- Fig. 3 seems to be limited on how the tool really works. What about adding a short video on the tool usage?

- What (web) technologies are used to develop Wwwyzzerdd? This could be made more explicit in the paper.

- "If there is a property connecting the Wikidata item for the article to the item linked, the corresponding orb is green." -> This could be problematic if there are > 1 properties between A and B. It could create, say, a false positive (seemingly green, but actually for another property.

- More use cases (motivating scenarios/running examples) can be added to the paper to show various scenarios as to how the proposed tools could be used.

- If there could be a diagram/architecture of Psychiq, that could help improve readability.

- "In the future the first sentence of the article or category hierarchy information may also be added as a feature to the model." -> Why not the whole article text also?

- The ML and language model of Psychiq could be described in more detail. Basically, more introduction and motivation could be helpful.

- The training and testing sets are somewhat poorly described. How are they created? How are they labeled? Furthermore, where did the 5.6 million examples come from?

- Why only a single epoch training?

- Actually, Fig. 5 shows an inherent problem of Wikidata: Ambiguities in class naming. In the fig, there could be a number of valid classes to be applied. Which one is the best one (linguistically, practically, according to community, or any other criteria)?

- One thing, Fig. 4 does not really have good explanation on how "Guessing" works. Moreover, can there be a slightly stronger baseline than Guessing?

>> Discussion

- As per my previous comment, the 111655 edits of Wwwyzzerdd could be better investigated for any insights/issues.

- A discussion on future work could be nice.

> Overall

I believe the tools have contributed to completing Wikidata. I'd suggest however some revisions to the current version of the paper.

Review #3
By Dimitris Kontokostas submitted on 12/Mar/2023
Suggestion:
Minor Revision
Review Comment:

This tool & system report presents two tools: Psychic and Wwwyzzerdd for completing Wikidata using Wikipedia. It has been submitted under the “Special Issue on Wikidata: Construction, Evaluation and Applications”. According to the special issue topics, I see this report fitting under the following two:
* Data quality in Wikidata: Approaches for problem detection, evaluation and improvement
* Tools, bots, and datasets for improving or evaluating Wikidata

After the introduction, where the users motivate the creation of these two tools, follows a related work section. That section does a good job in listing various automated or human-assisted tools that are used in the same context as Psychic and Wwwyzzerdd.
Section 3 describes the implementation of these two tools, how they operate on the background, the assumptions they are based on, and their limitations. Regarding Wwwyzzerdd, one thing that is not very clear is how the UI selects a property when that is not defined in Wikidata, e.g. figure 3 suggests “Seanan McGuire” as an author of “InCryptid” but is it because this relation is somehow guessed/sorted by an algorithm or because “author” property is alphanumerically first? Some clarification on this selection process would be helpful for the reviewers.
Section 4 discusses the use of these two tools. The adoption of Wwwyzzerdd appears to be very good; however, both tools are limited to English Wikipedia articles.

Overall the tools are not complex or too sophisticated but perform simple tasks well. The authors have carefully implemented them in a way that they can measure usage, which strengthens this work further.
An additional remark/concern is something that the authors also mention in the paper. The wikidata community does not promote (or discourage) the import of statements directly from Wikipedia without a reference, which means that this tool does not follow the suggested community guidelines. However, this is not listed in the special issue review guidelines.
A way to extract a reference next to a link that Wwwyzzerdd extracts would automatically make this tool more in line with the Wikidata goals.