Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
GQ Zhang

Former/Founding Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

RQSS: Referencing Quality Scoring System for Wikidata

Submitted by Seyed Amir Hoss... on 12/01/2023 - 10:35

Tracking #: 3593-4807

A new version of this paper is available

Authors:

Seyed Amir Hosseini Beghaeiraveri

Alasdair J G Gray1

Fiona McNeill

Responsible editor:

Maria Maleshkova

Submission type:

Full Paper

Abstract:

Wikidata is a collaborative multi-purpose Knowledge Graph (KG) with the unique feature of adding provenance data to the statements of items as a reference. More than 73% of Wikidata statements have provenance metadata; however, few studies exist on the referencing quality in this KG, focusing only on the relevancy and trustworthiness of external sources. While there are existing frameworks to assess the quality of Linked Data, and in some aspects their metrics investigate provenance, there are none focused on reference quality. We define a comprehensive referencing quality assessment framework based on Linked Data quality dimensions, such as completeness and understandability. We implement the objective metrics of the assessment framework as the Referencing Quality Scoring System - RQSS. The system provides quantified scores by which the referencing quality can be analyzed and compared. RQSS scripts can also be reused to monitor the referencing quality regularly. Due to the scale of Wikidata, we have used well-defined subsets to evaluate the quality of references in Wikidata using RQSS. We evaluate RQSS over three topical subsets: Gene Wiki, Music, and Ships, corresponding to three Wikidata WikiProjects, along with four random subsets of various sizes. The evaluation shows that RQSS is practical and provides valuable information, which can be used by Wikidata contributors and project holders to identify the quality gaps. Based on RQSS, the average referencing quality in Wikidata subsets is 0.58 out of 1. Random subsets (representative of Wikidata) have higher overall scores than topical subsets by 0.05, with Gene Wiki having the highest scores amongst topical subsets. Regarding referencing quality dimensions, all subsets have high scores in accuracy, availability, security, and understandability, but have weaker scores in completeness, verifiability, objectivity, and versatility. Although RQSS is developed based on the Wikidata RDF model, its referencing quality assessment framework can be applied to KGs in general.

Full PDF Version:

swj3593.pdf

Revised Version:

RQSS: Referencing Quality Scoring System for Wikidata

Previous Version:

RQSS: Referencing Quality Scoring System for Wikidata

Tags:

Reviewed

Long-term Stable Link to Resources:

https://github.com/seyedahbr/RQSS_Evaluation/releases/tag/v1.0.1

Decision/Status:

Minor Revision

Solicited Reviews:

Click to Expand/Collapse

Review #1

Anonymous submitted on 04/Mar/2024

Suggestion:
Accept

Review Comment:

1. Summary
Most of the comments were addressed.
2. Remaining remarks
It would be good if the authors could address the following remaining comments:
- Overflowing reference 25
- [Line 1, Page 36]: due to the lack of root privileges -> still not clear what is meant by “root-privileges” in this case, and which privileges are those?
- [Line 42, Page 27] italic consistency with the previous mention of “CAS Registry Number (P231)”. Check if it applies to others.
- Consider to check this argument “(as the pages of a publicly available dataset that contains
- insensitive information are rare to be attacked)”: sounds contradictory. Sensible data is usually more targeted by attacks and thus needs more security as normal data.
- The README in the GitHub repo does not contain any information about how to run the scripts (use the framework) or a description of the content/structure of the repo itself. Please extend it

3. Decision
The decision is: Accept

Review #2

Anonymous submitted on 30/Mar/2024

Suggestion:
Minor Revision

Review Comment:

I thank the authors for thoroughly revising the paper to address each of my comments.

My main remaining suggestion is to include a discussion on the actual lessons learned, i.e., what have we learned from all these metrics and analysis (say, 3-4 high-level findings), and what are the exciting research directions that RQSS opens? The last paragraph of §6 is relevant to this suggestion, but I think that providing an extended discussion to complement the current §6 and §7 would be very valuable.
Note: Section 6 is called “Lessons Learned”, but its content is mostly about “Limitations”. I think this would be a more adequate section title, as the discussion is rather about caveats when reading it.

To improve readability, I suggest a few actions during the preparation of the final version:
- carefully proofreading and using a spellchecker - I noticed some typos such as “According Zaveri et al.” (p. 5), “, Licensing” on page 18, “Interlinking:” on page 19, “Amount-of-data” (should have no dashes; p.28), “Wikiabse” (p.38).
- flattening 4.1 into a paragraph label instead of a subsection, as it is strange to have a single subsection.
- placing all footnotes after punctuation signs.

Log in or register to post comments
1893 reads

Main menu

Editorial Board

Syndicate

RQSS: Referencing Quality Scoring System for Wikidata

Tracking #: 3593-4807

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

RQSS: Referencing Quality Scoring System for Wikidata

Tracking #: 3593-4807

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles