On the Efficient Execution of Bounded Jaro-Winkler Distances

Tracking #: 1128-2340

kevin dressler
Axel-Cyrille Ngonga Ngomo

Responsible editor: 
Guest Editors Ontology and Linked Data Matching

Submission type: 
Full Paper
Over the last years, time-efficient approaches for the discovery of links between knowledge bases have been regarded as a key requirement towards implementing the idea of a Data Web. Thus, efficient and effective measures for comparing the labels of resources are central to facilitate the discovery of links between datasets on the Web of Data as well as their integration and fusion. We present a novel time-efficient implementation of filters that allow for the efficient execution of bounded Jaro-Winkler measures. We evaluate our approach on several datasets derived from DBpedia 3.9 and LinkedGeoData and containing up to $10^6$ strings and show that it scales linearly with the size of the data for large thresholds. Moreover, we also show that our approach can be easily implemented in parallel. We also evaluate our approach against SILK and show that we outperform it even on small datasets.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Aug/2015
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

The new version of the paper considers most of the reviewer's comments and recommendations. Except for the one that concern "a more qualitative evaluation".

Review #2
Anonymous submitted on 27/Aug/2015
Review Comment:

It appears that the authors have revised the paper. I did not see a document listing the reaction to each comment so I tried to manually detect revisions within the paper. It seems that all my comments/suggestions have been addressed (expect the first one [1] but maybe I was not able to locate it). In addition, the revision has improved the paper. I thus propose to accept this revised version.

[1] They authors state that this submission is an extension of [11], which was presented at an ISWC workshop. For completion, please explain, even briefly, the additions included in this submission in comparison to the workshop paper.