SAP-KG: Analysis of Synonym Predicates using Wikidata

Tracking #: 3384-4598

This paper is currently under review
Emetis Niazmand
Maria-Esther Vidal1

Responsible editor: 
Guest Editors Wikidata 2022

Submission type: 
Full Paper
Wikidata, as a community-maintained knowledge graph (KG), contains millions of facts; it may integrate different entities and relations with the same meaning. Contributors of community-maintained knowledge graphs can use new predicates which are similar in meaning to other predicates in the KG (a.k.a. synonym predicates). Detecting these synonym predicates plays a crucial role in interoperability and query answer completeness against community-maintained knowledge graphs. We tackle the problem of uncovering synonym predicates, and propose SAP-KG, a knowledge graph-agnostic approach, to uncover the predicates with similar meanings but relating complementary entities. SAP-KG comprises a set of metrics to describe and analyze synonym predicates; it resorts to Class-based Synonym Descriptions (CSDs) to capture the most important characteristics of the predicates of a knowledge graph. As a proof of concept, we evaluate SAP-KG over Wikidata and show the benefits of exploiting statements annotated with qualifiers, references, and ranks. Additionally, we present a query processing technique that put in perspective the role of synonym predicates in query answer completeness. We have empirically studied the distribution and percentage of overlapping synonym predicates in six domains in Wikidata. The highest percentage of synonyms has been detected in the Person domain at 86.66%, while Drug has the lowest percentage, i.e., 42.39%. These results provide evidence that community-maintained knowledge graphs enclose predicates that define the same real-world relationships.
Full PDF Version: 
Under Review