A Graph-based Approach for Inferring Semantic Descriptions of Wikipedia Tables

Tracking #: 3195-4409

This paper is currently under review
Binh Vu
Craig A. Knoblock

Responsible editor: 
Agnieszka Lawrynowicz

Submission type: 
Full Paper
Millions of high-quality tables are available in Wikipedia. These tables cover many domains and contain useful information. To make use of these tables for data discovery or data integration, we need precise descriptions of the concepts and relationships in the data, known as semantic descriptions. However, creating semantic descriptions is a complex process requiring considerable manual effort and can be error-prone. This paper presents a novel probabilistic approach for automatically building semantic descriptions of Wikipedia tables. Our approach leverages hyperlinks in Wikipedia tables and existing knowledge in Wikidata to construct a graph of possible relationships in a table and its context, and then it uses collective inference to distinguish genuine and spurious relationships to form the final semantic description. In contrast to existing methods, our solution can handle tables that require complex semantic descriptions of n-ary relations (e.g., the population of a country in a particular year) or implicit contextual values to describe the data accurately. In our empirical evaluation, our approach outperforms state-of-the-art systems on a large set of Wikipedia tables by as much as 12.6% and 4.8% average F1 scores on relationship and concept prediction tasks, respectively.
Full PDF Version: 
Under Review