Editorial Board

Editor-in-Chief
Krzysztof Janowicz

Managing Editors
Cogan Shimizu
Eva Blomqvist

Editorial Board
Mehwish Alam
Claudia d’Amato
Stefano Borgo
Boyan Brodaric
Philipp Cimiano
Oscar Corcho
Bernardo Cuenca-Grau
Elena Demidova
Jerome Euzenat
Mark Gahegan
Aldo Gangemi
Anna Lisa Gentile
Rafael Goncalves
Dagmar Gromann
Armin Haller
Pascal Hitzler
Aidan Hogan
Katja Hose
Eero Hyvönen
Sabrina Kirrane
Agnieszka Lawrynowicz
Freddy Lecue
Maria Maleshkova
Raghava Mutharaju
Axel Polleres
Guilin Qi
Marta Sabou
Harald Sack
Christoph Schlieder
Stefan Schlobach
Oshani Seneviratne
Cogan Shimizu
Ruben Verborgh
GQ Zhang

Former Editors-in-Chief
Pascal Hitzler

Editorial Assistants
Michael McCain

Syndicate

An Unsupervised Data-driven Method to Discover Equivalent Relations in Large Linked Datasets

Submitted by Ziqi Zhang on 03/26/2015 - 15:14

Tracking #: 1052-2263

Authors:

Ziqi Zhang

Anna Lisa Gentile

Isabelle Augenstein

Eva Blomqvist

Fabio Ciravegna

Responsible editor:

Guest Editors Ontology and Linked Data Matching

Submission type:

Full Paper

Abstract:

The Web of Data is currently undergoing an unprecedented level of growth thanks to the Linked Open Data effort. One escalated issue is the increasing level of heterogeneity in the published resources. This seriously hampers interoperability of Semantic Web applications. A decade of effort in the research of Ontology Alignment has contributed to a rich literature to solve such problems. However, existing methods can be still limited as 1) they primarily address concepts and entities while relations are less well-studied; 2) many build on the assumption of the ‘well-formedness’ of ontologies which is unnecessarily true in the domain of Linked Open Data; 3) few looked at schema heterogeneity from a single source, which is also a common issue particularly in very large Linked Dataset created automatically from heterogeneous resources, or integrated from multiple datasets. This article aims to address these issues with a domain- and language-independent and completely unsupervised method to align equivalent relations across schemata based on their shared instances. We propose a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to automatically decide similarity threshold to assert equivalence for a pair of relations, and an unsupervised clustering process to discover groups of equivalent relations across different schemata. Although the method is designed for aligning relations within a single dataset, it can also be adapted for cross-dataset alignment where sameAs links between datasets have been established. Using three gold standards created based on DBpedia, we obtain encouraging results from a thorough evaluation involving four baseline similarity measures and over 15 comparative models based on variants of the proposed method. The proposed method makes significant improvement over baseline models in terms of F1 measure (mostly between 7% and 40%), and it always scores the highest precision and is also among the top performers in terms of recall. We also make public the datasets used in this work, which we believe make the largest collection of gold standards for evaluating relation alignment in the LOD context.

Full PDF Version:

swj1052.pdf

Previous Version:

EQUATER - An Unsupervised Data-driven Method to Discover Equivalent Relations in Large Linked Datasets

Tags:

Reviewed

Decision/Status:

Solicited Reviews:

Click to Expand/Collapse

Review #1

By Jérôme Euzenat submitted on 19/Apr/2015

Suggestion:
Accept

Review Comment:

The revision has addressed most of the comments made (see previous review). So I suggest to accept it. The new version clarifies many obscure issues of the previous one.

We hope that the authors find that this improves the paper.

As one regret, nothing is said in the paper about the availability of the developed code in any form.

Some details are reported below:
- The abstract is a bit long: going directly to the contribution would help making it stronger.
- Very often the "that" is omitted. This is acceptable in spoken language, but in my opinion, far less in written language (e.g., p5 "we notice only", p6 "we believe the solution")
- p5 make use the -> use the ("make use" is acceptable in "to make use of")
- metrics (or similar)? Not sure what is meant. In mathematics, a metric is synonym for a distance which is precisely defined by three properties. Jaccard is a similarity, i.e., the dual of a distance. Dice, is not even called this, but Dice coefficient. The term "measure" could be a more neutral replacement.
- p6: "and attempt... by Shi and al." is an ugly sentence
- similarity thresholdS (or A similarity threshold)
- there are sometimes extra spaces (e.g., p6 "populations ," p22 "relations) .")
- p11: "So far" -> "So far,"
- p22: "from higher degree" -> "from a higher degree"
- bibliography:
[4] Tadeusz Caliński & Jerzy Harabasz -> 1 minute search
[33] James B. MacQueen -> 5 minutes search
Having all records complete is not a problem in these web time.
[40] It seems that Shvaiko and Pavel have been inverted.

Log in or register to post comments
13043 reads

Main menu

Editorial Board

Syndicate

An Unsupervised Data-driven Method to Discover Equivalent Relations in Large Linked Datasets

Tracking #: 1052-2263

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles

Search form

Main menu

Login

Editorial Board

Syndicate

An Unsupervised Data-driven Method to Discover Equivalent Relations in Large Linked Datasets

Tracking #: 1052-2263

Reviewed Articles

Authors & Reviewers

Links

Recent blog posts

Accepted Articles