Effective and Efficient Semantic Table Interpretation using TableMiner+

Tracking #: 1111-2323

Authors: 
Ziqi Zhang

Responsible editor: 
Isabel Cruz

Submission type: 
Full Paper
Abstract: 
This article introduces TableMiner+, a Semantic Table Interpretation method that annotates Web tables in a both effective and efficient way. Built on our previous work TableMiner, the extended version advances state-of-the-art in several ways. First, it improves annotation accuracy by making innovative use of various types of contextual information both inside and outside tables as features for inference. Second, it reduces computational overheads by adopting an incremental, bootstrapping approach that starts by creating preliminary and partial annotations of a table using 'sample' data in the table, then using the outcome as ‘seed’ to guide interpretation of remaining contents. This is then followed by a message passing process that iteratively refines results on the entire table to create the final optimal annotations. Third, it is able to handle all annotation tasks of Semantic Table Interpretation (e.g., annotating a column, or entity cells) while state-of-the-art methods are limited in different ways. We also compile the largest dataset known to date and extensively evaluate TableMiner+ against four baselines and two re-implemented state-of-the-art methods. TableMiner+ consistently outperforms all comparative models under all experimental settings. On the two most diverse datasets covering multiple domains and various table schemata, it achieves improvement in F1 by between 1 and 42 percentage points depending on specific annotation tasks. It also significantly reduces computational overheads in terms of wall-clock time when compared against classic methods that ‘exhaustively’ process the entire table content to build features for inference. As a concrete example, compared against an existing method based on joint inference implemented with parallel computation, the non-parallel implementation of TableMiner+ achieves significant improvement in learning accuracy and almost orders of magnitude of savings in wall-clock time.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 27/Jul/2015
Suggestion:
Minor Revision
Review Comment:

First i want to thank you the authors for taking up the reviewers comments and improving the work significantly. I see that my main points have been
corrected. However, i have
still some minor points where i see further improvement potential:

Introduction:
1. The sentence "Given a well-formed relational table (e.g., Figure 1)
and reference sets of concepts..." in the introduction is very hard to
read. Rephrasing this sentence would facilitate the understanding.

2. Page 2, right columns line 6: suffers => suffer

3. Page 2, right columns line 4 from below: 2*the

4. Sentence: "Therefore, we argue that a ‘complete’ Semantic Table
Interpretation method should handle columns of both data types." To
this stage of this work this is not clear to me how you want address
this. Maybe add an additional, explaining sentence.

Terms and Concepts:

1. Sentence: "Complex structures constitute a small population and
have not been focus in this work". This depends on the underyling
domain. This is probably true for webtables. In terms of tables in
research papers the story looks different.

Overview:
Figure 2 which provides an overview of the approach is hard to understand
due to its wealth of detail. I suggest to create a second
image which basically shows the approach more generally. This figure may
depict the main building blocks like NE-column interpretation, LEARNING
and UPDATE phase.

Column Subject Detection:
In equation 7 you simply add and weight the feature values by
yourself. Could you further improve the results if you apply machine
learning approaches like Learning To Rank? Venetis et al. also apply
Support Vector Machines. At least an explanation would be helpful why the
author do not use any machine learning algorithms.

Experiment settings:
You state that you have decided to use Freebase. But why? Is there a
main reason to ignore DBpedia?

10.3.1 Baselines
"and the the one receiving"

10.3.3
"The convergence threshold in the I-Inf algorithm is set to 0.01 in
both subject column detection and the preliminary column
classification phases".
Have you tried another threshold values? Did you determine this value
empirically?

Conclusion:

In your work you strongly focus on the general domain. Are there any
plans to focus on more domain-specific (e.g. biomedical or computer
science) or complex tables? It would be very interesting how your
framework performs on other domains (this would be of course future work, but could be mentioned).

Review #2
By Venkat Raghavan Ganesh submitted on 26/Aug/2015
Suggestion:
Minor Revision
Review Comment:

The paper "Effective and Efficient Semantic Table Interpretation using TableMiner+" proposes a bootstrapping approach to semantically interpret tables through various annotation and annotation filtering methods. The proposed system has been experimented with a comprehensive list of publicly available datasets that comprises of well structured tables. The confusing mathematical notations have been corrected and the new glossary helps.

There always exists a greater difficulty in evaluating results in this area mainly due to the lack of proper baselines. While the authors did their best to re-implement previous approaches, they are still limited to entity based columns and implementation methods are not identical based on their notes in Appendix D. This is risky and could easily mislead future readers who may assume their re-implemented results to be correct for comparison. I would suggest highlighting the lack of existing baselines and about their 'near identical' re-implementation effort -- if not in the abstract, at least in the table of results - (e.g., Table 11). That said, I would still be slightly concerned about their current abstract that claims the system to be outperforming again 'comparative' models that was actually created by the authors.

And, I hope authors would add proper pointers to their code/software, if this gets accepted - otherwise suggest removing its mentions.

The revised version looks better. I would suggest it undergo a minor revision.


Comments

First of all I take this opportunity to thank reviewers for their extremely thoughtful and helpful comments. I have attempted to address all comments by all reviewers and I believe they have helped improving the article significantly. I include a summary below, followed by detailed response to each reviewer.

Overview:
- The 'serious concern' by one reviewer has been addressed by (1) adding serious argument for addressing efficiency in the Semantic Table Interpretation algorithms; and (2) performing empirical experiments to prove that local caching and parallelization is not the ultimate solution
- Other concerns by reviewers on the clarity of description have been addressed. I have largely re-organised the methodology sections (5-9) and re-written some of them. I also added more detailed overview of the methodology, detailed diagram showing the workflow, and used running examples throughout the sections. The article is now a few pages longer than the first submission, due to the addition of such details (see detailed responses below)
- Concerns by the editor have been addressed by both running a spell checker and proof reading, also adding a lookup table for mathematical notations and functions (appendix F)
- The article has been re-scoped to take into account that some work has already been published at ISWC2014

Specific responses:

[[Reviewer #1]]

>… the work's abstract seems to be a bit misleading since it's advertizing with introducing TableMiner. The claims in THIS paper should be scoped carefully in the abstract.

RESPONSE: We have renamed the system, rewritten the abstract, revised introduction, related work and added paragraphs in the methodology sections to properly scope the claims in this article and explicitly explain what are new, what have been changed, and what have been previously published. Specifically:
- The system in this work is renamed to ‘TableMiner+’
- The abstract is re-written and mentions that this work extends our previous work of TableMiner
- The introduction (page 3, 3rd paragraph on the left) briefly introduces what have been done in this work in addition to our previous paper at ISWC2014
- The related work section (page 7, 3rd paragraph on the right) compares this work against our previous paper
- Section 7 (page 12, 3rd paragraph on the right) explains that components from our previous work have been re-used
- Section 8 (page 16, 3rd paragraph on the left) explains what are new compared against our previous work
- Appendix A lists name changes in this work

>4. Overview and Notations. To be honest, i was a little bit confused about another chapter containing notations. Is it possible to merge both notation section… The current section 4 should be kept for acquainting the reader with the overall algorithm process. Basically, figure 2 shows the algorithm's overall procedure which is clear and intuitive. However, the reading of section 5 - 10 showed that in this section some more work in terms of explaining the algorithm's processes has to be done.

RESPONSE: We have re-defined the structure of this article. Now Section 2 defines terms and concepts and removes notations; Section 4 only describes the overview of TableMiner+. A more detailed overview is provided in Section 4, and a more detailed diagram showing the data flow and processes of TableMiner+ is provided (Figure 2).

>6. Subject Column Detection… In equation 8, the feature weights are manually determined. Have you tried to learn the feature weights? For instance, the explanation of why you chose 0.5 for feature "uc" is poor. Basically, I understand why feature x is more important than feature y, but it is not clear how you determined the feature weights.
RESPONSE: the weights are arbitrarily defined based on subjective perception. We did not tune the weights. We added a sentence to explain this.

>7 - 9. NE-Column interpretation - building blocks, NE-Column interpretation - the LEARNING phase, NE-Column interpretation - the update phase These sections are very difficult to understand.
RESPONSE: We have largely revised the methodology sections from 5 to 9. Specifically
- We strengthened the overview of methodology (section 4) with more detailed description and a more detailed workflow diagram (figure 2)
- We re-arranged sections 5 – 9 based on temporal order of the processes
- We used a running example in almost every sections from 5 to 9
- We have re-written some sections for better clarity.

>… There is a huge number of introduced symbols. Even if all symbols are correctly introduced, it is extremely hard to follow. Several symbols are abbreviations of terms and thus the reader has to remember all these. The number and forms of these symbols can be simplified for sure.
RESPONSE: We have largely revised the notations by (1) dropping some unnecessary notations and simplifying some others; (2) adding a lookup table for notations and functions in Appendix F.

>… As mentioned in this review section 4, I was not sure whether I completely understood the whole algorithm process. These sections lack overview depicting the whole process in greater detail (Figure 2 is not enough).
RESPONSE: As described above, we now have re-organised the methodology sections and re-written some subsections for clarity. We added more detailed overview of the method, with a detailed workflow diagram. We used a running example that connects all sections from 5 – 9. At the beginning of each section, we also briefly describe how it is connected with the previous section.

>Additionally, in your related work section you explain that in your paper you significantly refine the column classification and named entity disambiguation methods by changing mathematical formulas etc in contrast to the work "Towards effective and efficient table interpretation" proposed at the International Semantic Web Conference 2014. You did not describe the changings in the respective sections.
RESPONSE: We have re-scoped the abstract, revised the introduction and related work for these claims. Also in sections 7 and 8 at the beginning, we now have explicitly described what components are inherited and what are new. Please also see our response to the first comment above.

>Unfortunately, the results attained by TableMiner cannot be directly compared with the results of Venetis et al. and Mulwad et al. due to unpublished data set changes. This should be explicitly underlined in the respective sections again.
RESPONSE: We now re-implemented the state-of-the-art methods by Limaye et al. and Mulwad et al. and included comparison against the systems.

>In table 11 you note a 90.9% F1 measure at the named entity disambiguation task of the approach by Mulwad et al. That is not correct.
RESPONSE: corrected

[[Reviewer #2]]

> … it would be nice to clarify TableMiner's accuracy on a large number of heterogeneous tables. For example, tables that only have 15-20 rows but originate from various domains/sources. In these cases, computation would be almost negligible and accuracy in identifying relationship will be the goal.
RESPONSE: we kindly refer the reviewer to Figure 7, which shows that the average number of rows in the Limaye200 and LimayeAll datasets are about 20. The two datasets are created by crawling the Web and Wikipedia randomly, hence covering a wide range of domains. We hope this already addressed the reviewer’s concern.

>… some equations need to be much clearer and use of notations has to be consistent and meaningful. For example, indicating "PG" to indicate a list of web pages decreases readability. On page 8 Section 5, notations need some clarity. The missing space between the mathematical relations (<=, >=) should be corrected.
RESPONSE: we have revised mathematical notations. The problematic notations are dropped.

[[Reviewer #3]]

> 1) The authors decide that runtime is a crucial performance criterion of table annotation systems. This is the fundamental motive for their algorithms that attempt to iteratively sample from target tables, scoring candidate extractions' semantic labels row-by-row. I think this is a serious mistake. Almost all work to date has focused on the markup quality, for good reason. The runtime overhead is entirely dependent on the features being used. But even in the Freebase-style case discussed here, there is likely no heavy computation going on; rather, it is likely that the observed Freebase lookup times are driven primarily by network latency. If the users had a local copy of the Knowledge Base data (which is possible for some some KBs) then the overhead time here would likely be minimal. (Note that experiments evaluate runtime entirely with a single thread, maximizing the problems with latency.) It is a stretch for me to believe that this is a serious problem: it can be fixed by simply downloading a dataset and building a local index.
RESPONSE: We have addressed this point by doing the following work:
- Largely revised the introduction section to avoid giving the impression that efficiency is the ONLY motivation. Instead, we advance state-of-the-art in three dimensions: effectiveness (accuracy), efficiency, and completeness.
- Added one paragraph in introduction (page 2, 2nd paragraph on the right) to argue that (1) efficiency is an important factor to consider in this task; (2) parallelization and local caching is not the ultimate solution (see empirical proof below). The key is however, to develop more efficient algorithms that reduce the number of data item for processing.
- Using a LOCAL COPY of the knowledge base, compared our work TableMiner+ against two state-of-the-art methods, and showed that TableMiner+ is significantly faster. In particular, TableMiner+ is implemented as single thread, while one state-of-the-art method (based on joint inference) uses 50 parallel threads. Even with such a setting that TableMiner+ is significantly disadvantaged, TableMiner+ still delivers near orders of magnitude of savings in the running time compared against the other method (see page 28, 1st paragraph on the right, starting with “Secondly…”)
We hope this is convincing evidence that (1) semantic table interpretation can significantly benefit from efficient algorithms, and (2) parallelization and local caching is not the ultimate solution.

>The users don't actually compare against baselines from the literature. Rather, they compare against some simple --- and somewhat artificial --- baselines. As a result, it is hard to say whether their approximation results are actually as good as they should be.
RESPONSE: We have re-implemented two state-of-the-art methods (section 10.3.2), because no existing software is available. The methods are adapted to Freebase, as they previously used other knowledge bases. We include details of adaptation in Appendix D. Further, we also release the software so researchers can re-create the experiments.
We ran the two re-implemented state-of-the-art methods on the datasets and compare the results against TableMiner+ in Section 11.2.2. We found that (1) the two re-implemented methods even underperformed some baselines we developed, confirming that our baselines are not weak; (2) TableMiner+ remains the winner of all systems.

>There is a real lack of useful examples in the first half of the paper. The authors have a few one-line examples, but the paper could really benefit from a longer running example that helps to illustrate every extraction target
RESPONSE: We have now taken the advice of the reviewer and added a running example. From Sections 5 to 9, we use the example in every section to walk readers through the entire process of TableMiner+.

>The abstract could really use a stronger example
RESPONSE: the abstract has now been re-written.

>The use of "concept" to mean "type" should be better described in the intro. Also: use examples!
RESPONSE: corrected. See the 1st paragraph on page 1

>The link between your stated goals and the iterative method that is introduced at the end of page 2 is unclear. It eventually becomes clear why you're doing this, but iterative methods can serve multiple ends: minimizing human labels, for example. Your main goal is performance, but that doesn't come through right away.
RESPONSE: we have now re-written the introduction and revised our goals.

>The evaluation criteria in the middle paragraph of left-column page 3 come from nowhere. I thought the whole point was increased performance, not accuracy. You have not told us that these other goals are things you're also trying to address.
RESPONSE: we have now revised our goals and clearly set out that we also address effectiveness.

>There is a lot of related work from Kaushik Chakrabarti, on the InfoGather system. It should be cited and discussed.
RESPONSE: we added a whole section containing InfoGather and other systems similar to it. See section 3.3.

>Algorithm 1 I-Inf is unnecessarily high level. It communicates the "meta-algorithm" but it is so high level that I can't understand if it's doing something useful.
RESPONSE: we briefly revised the algorithm adding a little more detail. We argue it is necessary to keep it as high level as possible, as it provides the reader with the necessary principles for the following sections. Note that I-Inf is used in Section 6 (page 11, 1st paragraph on the right) and Sections 7.2.2 and 7.2.3. We believe it is redundant to create individually specific algorithms for these sections.

>Section 6 is unclear what is the authors' work, and what is due to [30].
RESPONSE: the subject column detection method is new and not referencing [30]. We revised the beginning paragraph of this section to clarify this.

>The List of peaks named Bear Mountain" example is too sketchily described. Authors should provide the concrete example in the paper and discuss it more carefully.
RESPONSE: We added details, and also included a screenshot of the table. See page 13, paragraphs around Figure 5.

>In the experiments, the authors permit a confusing statement about how the bootstrapping process and how it relates to interdependence among components. Paragraph begins, "In all baselines…" I am not sure whether they are trying to say something new here or not.
RESPONSE: descriptions of the baselines have now been revised. See section 10.3.1.