Effective and Efficient Semantic Table Interpretation using TableMiner+

Tracking #: 1339-2551

Authors: 
Ziqi Zhang

Responsible editor: 
Pascal Hitzler

Submission type: 
Full Paper
Abstract: 
This article introduces TableMiner+, a Semantic Table Interpretation method that annotates Web tables in a both effective and efficient way. Built on our previous work TableMiner, the extended version advances state-of-the-art in several ways. First, it improves annotation accuracy by making innovative use of various types of contextual information both inside and outside tables as features for inference. Second, it reduces computational overheads by adopting an incremental, bootstrapping approach that starts by creating preliminary and partial annotations of a table using ‘sample’ data in the table, then using the outcome as ‘seed’ to guide interpretation of remaining contents. This is then followed by a message passing process that iteratively refines results on the entire table to create the final optimal annotations. Third, it is able to handle all annotation tasks of Semantic Table Interpretation (e.g., annotating a column, or entity cells) while state-of-the-art methods are limited in different ways. We also compile the largest dataset known to date and extensively evaluate TableMiner+ against four baselines and two re- implemented (near-identical, as adaptations are needed due to the use of different knowledge bases) state-of-the-art methods. TableMiner+ consistently outperforms all models under all experimental settings. On the two most diverse datasets covering multiple domains and various table schemata, it achieves improvement in F1 by between 1 and 42 percentage points depending on specific annotation tasks. It also significantly reduces computational overheads in terms of wall-clock time when compared against classic methods that ‘exhaustively’ process the entire table content to build features for inference. As a concrete example, compared against a method based on joint inference implemented with parallel computation, the non-parallel implementation of TableMiner+ achieves significant improvement in learning accuracy and almost orders of magnitude of savings in wall-clock time.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Michael Granitzer submitted on 25/Jul/2016
Suggestion:
Accept
Review Comment:

I see all my previous comments integrated. Congratulations to the authors.

Review #2
By Venkat Raghavan Ganesh submitted on 12/Aug/2016
Suggestion:
Accept
Review Comment:

The paper addressed all the issues mentioned in the previous review. The abstract is now much clear with mentions about the baseline. One minor concern is about the frequent reference of Venetis' approach in the non-related work section. Some of the lines, for example Page 10, "Although Venetis et al. [35] have introduced a supervised subject column classifier, in this work we propose a new unsupervised subject column detection algorithm that uses a different set of features listed in Table 2" ... Not sure the role of "Although" part while the authors are using a completely different ML approach.

However, this doesn't affect the overall content.