Editorial Board

Editors-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Sanaz Saki Norouzi

Syndicate

STEM: Stacked Threshold-based Entity Matching for Knowledge Base Generation

Submitted by Enrico Palumbo on 11/07/2017 - 06:00

Tracking #: 1762-2974

Authors:

Enrico Palumbo

Giuseppe Rizzo

Raphael Troncy

Responsible editor:

Guest Editors ML4KBG 2016

Submission type:

Full Paper

Abstract:

One of the major issues encountered in the generation of knowledge bases is the integration of data coming from a collection of heterogeneous data sources. A key essential task when integrating data instances is the entity matching. Entity matching is based on the definition of a similarity measure among entities and on the classification of the entity pair as a match if the similarity exceeds a certain threshold. This parameter introduces a trade-off between the precision and the recall of the algorithm, as higher values of the threshold lead to higher precision and lower recall, and lower values lead to higher recall and lower precision. In this paper, we propose a stacking approach for threshold-based classifiers. It runs several instances of classifiers corresponding to different thresholds and use their predictions as a feature vector for a supervised learner. We show that this approach is able to break the trade-off between the precision and recall of the algorithm, increasing both at the same time and enhancing the overall performance of the algorithm. We also show that this hybrid approach performs better and is less dependent on the amount of available training data with respect to a supervised learning approach that directly uses properties' similarity values. In order to test the generality of the claim, we have run experimental tests using two different threshold-based classifiers on two different data sets. Finally, we show a concrete use case describing the implementation of the proposed approach in the generation of the 3cixty Nice knowledge base.

Full PDF Version:

swj1762.pdf

Previous Version:

STEM: Stacked Threshold-based Entity Matching for Knowledge Base Generation

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Ondřej Zamazal submitted on 20/Dec/2017

Suggestion:
Accept

Review Comment:

I would like to thank authors of the paper for their work on paper improvements. They successfully tackled all my remarks. Additionally, I spotted two minor typos:
* regarding runtime performance in Section 6.3 I would say that there should be T_{STEM} instead of T_{total}.
* in Section 7 authors added an explanation for "rigid search mechanism". Please check the typo related to "are" in "...resolved any conflict of representation by optimizing the selection criteria are:".

Review #2

Anonymous submitted on 06/Jan/2018

Suggestion:
Minor Revision

Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The paper significantly improved since the initial submission. It has been reorganised and partly rewritten to consider most of the reviewer suggestions.

My remaining concerns are:
- The related work section has been enriched. However it lacks of a deep comparison with existing entity matching works, appart from that the STEM approach can be used on the top of any pairwise numerical threshold-based classifier. Some ensemble learning approaches should be motioned even is they do not deal with entity matching (https://renespeck.de/data/2014/ISWCpaper.pdf).

- I really liked the problem formulation section but it misses a summary paragraph which gives a formulation of the problem in terms of an ensemble learning problem that considers a set of entity matching decisions provided by different threshold-based systems.

- Section 4.2 is clearer now and support the soundness of the proposed approach. May be the authors should give an idea of how \lambda in equation (22) is estimated (it is important to be convinced by the equations (26) and (27))?

- minor remarks:
- paragraph before definition 3, “… e1 and e2 is carried out on a set OF literal value …”
in definition 5: add a line breaking.
- section 4.3: “However, as a rule of thumb, ….. that: O(N ∗ g2) < Ttrain(N, g)(N, g) < O(N ∗ g3)” ==> ““However, as a rule of thumb, ….. that: O(N ∗ g2) < Ttrain(N, g)< O(N ∗ g3)”
- section 4.3: use the latex symbol ‘\leq’ instead of ‘<=‘

Review #3

By Mohamed Sherif submitted on 09/Jan/2018

Suggestion:
Accept

Review Comment:

This is the second version of the article “STEM: Stacked Threshold-based Entity Matching for Knowledge Base Generation”.
I thank the authors for addressing all the raised issues. The paper is definitely suitable for publication.

I am here mentioning some minor remarks for the authors to be addressed in the camera-ready version of the paper:
• Definition 5: The confidence vector equation exceeds the column limit.
• Definition 6: “… matching systems in stating that …” -> “… matching to state that …”
• In my opinion, it is better to distinguish the \hat{f} used in definition 7 from the one used in definition 3, may be by adding some sub- or superscript.

Log in or register to post comments
8262 reads

Main menu

Editorial Board

Syndicate

STEM: Stacked Threshold-based Entity Matching for Knowledge Base Generation

Tracking #: 1762-2974

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

STEM: Stacked Threshold-based Entity Matching for Knowledge Base Generation

Tracking #: 1762-2974

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles