Editorial Board

Editor-in-Chief
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Michael Cochez
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Sebastián Ferrada
Mark Gahegan
Aldo Gangemi
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Krzysztof Janowicz
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Angelo Salatino
Christoph Schlieder
Stefan Schlobach
Cogan Shimizu
Blerina Spahiu
Sanju Tiwari
GQ Zhang
Rui Zhu

Former/Founding Editors-in-Chief
Krzysztof Janowicz
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

An Assertion and Alignment Correction Framework for Large Scale Knowledge Bases

Submitted by Jiaoyan Chen on 06/26/2021 - 05:04

Tracking #: 2829-4043

Authors:

Jiaoyan Chen

Ernesto Jimenez-Ruiz

Ian Horrocks

Xi Chen

Erik Bryhn Myklebust

Responsible editor:

Guest Editors KG Validation and Quality

Submission type:

Full Paper

Abstract:

Various knowledge bases (KBs) have been constructed via information extraction from encyclopedias, text and tables, as well as alignment of multiple sources. Their usefulness and usability is often limited by quality issues. One common issue is the presence of erroneous assertions and alignments, often caused by lexical or semantic confusion. We study the problem of correcting such assertions and alignments, and present a general correction framework which combines lexical matching, context-aware sub-KB extraction, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated with one set of literal assertions from DBpedia, one set of entity assertions from an enterprise medical KB, and one set of mapping assertions from a music KB constructed by integrating Wikidata, Discogs and MusicBrainz. It has achieved promising results, with a correction rate (i.e., the ratio of the target assertions/alignments that are corrected with right substitutes) of 70.1%, 60.9% and 71.8%, respectively.

Full PDF Version:

swj2829.pdf

Previous Version:

Correcting Assertions and Alignments of Large Scale Knowledge Bases

Tags:

Reviewed

Long-term Stable Link to Resources:

https://github.com/ChenJiaoyan/KG_Curation

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Petr Křemen submitted on 08/Jul/2021

Suggestion:
Accept

Review Comment:

Thanks authors for their revision and answers. I have only one observation:
- I can still see "input KB" used in page 4

Review #2

By Heiko Paulheim submitted on 13/Jul/2021

Suggestion:
Minor Revision

Review Comment:

The authors have done a great job in addressing most of my comments. Especially since the evaluation metrics are now much clearer to me. To help other readers that might get confused like I was, it might be an option to depict a 3x2 confusion matrix ((GT exists, GT does not exist) x (correct replacement, wrong replacement, no replacement)), and illlustrate the measures using that matrix.

There are still a few open (and a few new ;-)) questions open.

Old questions:
* While I see that some of my questions for clarification are now addressed (particularly here: why neighborhoods were not used as candidates, and how salient semantic confusion is), it would be nice to provide some real examples here. The numbers arguing for the neighborhood sizes from the cover letter should be included in the paper, since they provide a hard justification for the approach chosen.
* I see my comment on the neighborhod was eaten by the journal website's HTML encoder ;-) Let me try again: Section 4.3.1: algorithm 1 seems to extract neighborhoods only with statements with the same predicate as the target assertions. For example, if my target assertion was "Germany capital Berlin", the neighboorhood graph would not contain, e.g., "Germany seatOfGovernment Berlin" or "Germany hasPOI Berlin_Wall". Is that really intended?

New questions:
* in Fig. 4: why does the correction rate decline with higher thresholds for DBpedia, but increase for the other two datasets? There seems to be some particularity that DBpedia has, but the other two do not. It would be interesting to dig a bit deeper here.
* Since there are quite efficient implementations of RDF2vec out there (like jRDF2vec, which can be considered the fastest one, or even faster approximations like RDF2vec Light), which scale well to larger graphs as well, it would not have been a too big deal to train RDF2vec on the other two datasets as well. In the final version, I would find it neat to see results on those datasets as well.

Overall, I am very happy with revision. The remaining few questions could be solved in a minor revision.

Review #3

By José María Álvarez Rodríguez submitted on 03/Aug/2021

Suggestion:
Accept

Review Comment:

The reviewed version of the paper has properly addressed previous comments making the content more understandable and justifying some conceptual and design decisions. Apart from the theoretical approach validated through the experiments, it is specially relevant the improvement in the discussion section.

There are only some minor things that can be done when the paper transitions to the publishing process:

-Check that the numbering of references is correct according to the journal/editorial rules. Currently, it is not clear the numbering order (first, etc.).
-When making the reference to SHACL and SheX, please, include a reference or footnote to the W3C recommendations.
-Although given the context it is not possible to compare results with previous approaches (a benchmark for this task is not avaible or the metrics are not the same), it would be nice to see a quantitative comparison (e.g. correction rate) of previous approaches to the presented one.

Log in or register to post comments
7392 reads

Main menu

Editorial Board

Syndicate

An Assertion and Alignment Correction Framework for Large Scale Knowledge Bases

Tracking #: 2829-4043

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

An Assertion and Alignment Correction Framework for Large Scale Knowledge Bases

Tracking #: 2829-4043

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles