# EQUATER - An Unsupervised Data-driven Method to Discover Equivalent Relations in Large Linked Datasets

### Tracking #: 872-2082

Authors:
Ziqi Zhang
Anna Lisa Gentile
Eva Blomqvist
Isabelle Augenstein
Fabio Ciravegna

Responsible editor:
Guest Editors Ontology and Linked Data Matching

Submission type:
Full Paper
Abstract:
The Web of Data is currently undergoing an unprecedented level of growth thanks to the Linked Open Data effort. One escalated issue is the increasing level of heterogeneity in the published resources. This seriously hampers interoperability of Semantic Web applications. A decade of effort in the research of Ontology Alignment has contributed to a rich literature dedicated to such problems. However, existing methods can be still limited when applied to the domain of Linked Open Data, where the widely adopted assumption of 'well-formed' ontologies breaks due to the larger degree of incompleteness, noise and inconsistency both found in the schemata and in the data described by them. Such problems become even more noticeable in the problem of aligning relations, which is very important but insufficiently addressed. This article makes contribution to this particular problem by introducing EQUATER, a domain- and language-independent and completely unsupervised method to align equivalent relations across schemata based on their shared instances. Included by EQUATER are a novel similarity measure able to cope with unbalanced population of schema elements, an unsupervised technique to automatically decide similarity cutoff thresholds to assert equivalence, and an unsupervised clustering process to discover groups of equivalent relations across different schemata. The current version of EQUATER is particularly suited for a more specific yet realistic case: addressing alignment within a single large Linked Dataset, the problem that is becoming increasingly prominent as collaborative authoring is adopted by many large-scale knowledge bases. Using three datasets created based on DBpedia (the largest of which is based on a real problem currently concerning the DBpedia community), we show encouraging results from a thorough evaluation involving four baseline similarity measures and over 15 comparative models by replacing EQUATER components with their alternatives: the proposed EQUATER makes significant improvement over baseline models in terms of F1 measure (mostly between 7% and 40%). It always scores the highest precision and is also among the top performers in terms of recall. Together with the released dataset to encourage comparative studies, this work contributes valuable resources to the related area of research.
Full PDF Version:
Tags:
Reviewed

Decision/Status:
Minor revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Andreas Thalhammer submitted on 26/Nov/2014
 Suggestion: Major Revision Review Comment: The paper addresses the challenge of aligning ontology/vocabulary predicates in Large Linked Datasets. The main idea of the approach is to leverage the knowledge that comes from shared instances (extensional matching). In order to achieve this goal, the authors introduce three measures of instance-based similarity between properties: triple agreement, subject agreement, knowledge confidence modifier. The thresholds for deciding about similarity are determined through Jenks natural breaks. The similarity measures are used for clustering the predicates (that are referred to as relations in the paper). For this, the authors use group-average agglomerative clustering with the Calinski and Harabasz stopping rule on a previously computed distance matrix. The evaluation separates between pairwise equivalence and clustering while focusing on the three aspects: similarity measure, threshold detection, and knowledge confidence models. The paper is very well written and it is easy for the reader to follow the reasoning of the authors. For a journal paper, the approach and the experiments are explained at an appropriate level of detail. The related work is well covered. I see two main points for improving this paper: 1)) In order to judge the actual contribution of this work, it would be necessary to see EQUATER compared to another extensional matcher. The used baselines are lexical matchers which target ontologies without large instance data. 2)) The choice for group-average agglomerative clustering with the Calinski and Harabasz stopping rule seems a bit arbitrary. In contrast to the other components, no further clustering techniques are tested. It is arguable that this would introduce a new parameter and would overload the setting further. Nonetheless, as this comprises an important part of the approach at least a single alternative should be tested. Two less significant points of criticism are: 3) It is understandable that parts of this work have been published before and Section 2.6 is absolutely necessary. As an improvement, it would be good to have some additional pointers at the beginning of the particular sections in which previously published work is recapitulated. Also, it would be good to have additional explanations about changes that were made (e.g. I realized that the formula notation/style changed when I compared this work to "Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets"). However, I would argue that this work contains enough new content to be published as a new contribution. 4) A more general discussion about schema heterogeneity in Large Linked Datasets (that is the focus of the paper) vs. schema heterogeneity in Linked Data in general would be useful. This distinction is not very clear in the introduction, e.g. "This work focuses specifically on linking heterogeneous relations in the LOD domain". The discussion could go to the direction "why DBpedia is not equivalent with LD" and point out the differences that are relevant for the presented work. Minor: x The abbreviation kc doesn't seem to be introduced before page 19 (although being used frequently at earlier points). x "...current DBpedia ontology version 3.9" - it's not current anymore. x Although the related work is covered well, the publication "Instance-Based Ontological Knowledge Acquisition" by Lihua Zhao and Ryutaro Ichise (ESWC 2013) is not discussed.
Review #2
By Andriy Nikolov submitted on 15/Dec/2014
 Suggestion: Minor Revision Review Comment: The paper describes an algorithm for aligning relations between ontologies used by datasets in the linked data cloud. The algorithm combines several techniques: in particular, extensional similarity measure between relations based on the overlapping sets of arguments, clustering mechanism for selecting equivalent relation pairs among the candidate ones, a technique for inferring equivalence sets exploiting the transitivity of alignments. The authors report experimental testing for determining the optimal parameters of the algorithm. In my view, this paper makes two main contributions: - Since the majority of the extensional techniques for ontology alignment in the linked data cloud primarily focused on the alignments between concepts, the algorithm provides a valuable contribution by focusing on the less studied task of relations matching. - The use of unsupervised techniques at all stages is very important in the context of linked data cloud where one cannot expect sufficient amounts of training data. However, there are some aspects, which, in my view, could be clarified better. The paper focuses on equivalence relations between properties. However, it is not clear to which extent the semantic correspondences between relations in real-world schemata conform to the definition of equivalence. E.g., the ontological concepts are often used differently in different datasets and their instance sets only partially overlap despite being semantically very close [39]. To which extent the same factors apply to relations? At least judging from the common errors produced by EQUATER (section 5.2), semantic equivalence is often subjective and can often be confused with high-degree semantic similarity (Table 8). Another aspect involves the authors’ decision to use dbpedia as an evaluation dataset: a set of schemata sharing the same set of instances. This choice is justified: taking datasets originating from different sources (even those already having owl:sameAs links between instances) would usually provide only a few examples of overlapping concepts and even fewer pairs of matching relations. However, the choice to use a single dataset only could bias the results in comparison with the “canonical” ontology matching use case involving datasets originating from different sources. In that case the noise caused by imprecise instance and concept matching as well as different interpretations of relations in different datasets would become additional factors possibly influencing the performance of different techniques. I would not ask the authors to perform another set of experiments with datasets from heterogeneous sources (that would probably be a subject of another paper), but a bit more discussion on these aspects and their likely influence would be valuable.
Review #3
By Matteo Palmonari submitted on 15/Dec/2014
 Suggestion: Minor Revision Review Comment: The paper describes a method to find equivalent properties in large linked datasets, which is evaluated using three datasets extracted from DBpedia. Key aspects of the approach are: an extensional similarity measure that is aware of the quality of the input; an unsupervised method to define the cutoff threshold to select the aligned properties; a method to cluster properties so as to improve the quality of the final alignment. The paper provides several important contributions to the state of the art, and is, overall, in a good shape in its current status. The scope of the paper, determined by title and abstract, is precisely stated. The problem addressed in the paper has not been investigated as extensively as other ontology matching problems. The paper is well organized, well written and easy to read (despite few remarks highlighted below). The formalization is sound and concise and the proposed method is described at an appropriate level of detail. The method has the advantage of being fully unsupervised. The evaluation is presented with abundance of details, with the performance of each main component of the proposed approach being adequately discussed. The experimental section is very long, which makes it a bit hard to get through at times. However, I found it interesting beyond the scope of the approach, and in particular to shed light on property usage in Linked Open Data. Overall, despite the approach cannot be used as-is to align properties of distinct datasets, I think that some of the results discussed in the paper will be relevant for future work addressing this issue. For example, the unsupervised method adopted for selecting the cut-off threshold is particularly interesting. The section with the qualitative analysis of false positives and negatives, and the appendix describing methods that were experiments are also useful. A weakness can be found in using only datasets extracted from DBpedia. However, considering the heterogeneity and size of this source, and the interesting discussions of the results, I think that the paper can be accepted for publishing after minor revision. The minor revision should address one main issue and few minor issues listed below (grouped by page ) Although the efficiency of the approach is not a main question for this research (the approach could be run offline), the features of the equipment adopted for the experiments should be described. In addition, at least the order of magnitude of the execution times should be reported. It is important to know if these methods require 1 sec vs 1 min vs 1 hour vs 1 day to align a dataset like DBpedia. Page 2 “This work explores this issue and particularly studies linking rela- tions across different schemata in the LOD domain, a problem that is currently under-represented in the literature.” I think it should be mentioned from here that the method focuses on different schemata used in one large dataset. Page 3 “A corre- spondence asserts certain” A correspondence asserts that certain Page 4 “However, equivalence between relations are not addressed.” However, equivalence between relations is not addressed. From “Parundekar et al. [40] … “ on: it is not completely clear if these approaches do consider property matching or not. State it more sharply Page 5 “Same relation names are frequently found to bear different meanings in different contexts [18], e.g., in the DBpedia dataset ‘before’ is used to describe rela- tionship between consecutive space missions, or con- secutive Academy Award winners [14].” This is quite unclear to me. A relation can have an intensional meaning that goes beyond its extension. Why should these two meanings be considered different? They both seem to represent temporal relations between events. This is an important issue to discuss more clearly. Page 6 The comparison with work in databases is still quite shallow. Why should these methods be affected by sparsity? Is any of these method proposing instance-based matching? A better explanation is needed here. Cutoff thresholds in matchers. Another approach worth mentioning here is the active learning method of Shi et al. 2009. The method is supervised but it is used also to learn a cutoff threshold. Feng Shi, Juanzi Li, Jie Tang, Guo Tong Xie, Hanyu Li: Actively Learning Ontology Matching via User Interaction. International Semantic Web Conference 2009: 585-600 “but largely extends it in several dimensions: (1) the matcher“ Item (1) is a sentence, while other items (2,3,4,5) are descriptions of contributions. Please, align the sentence structures. Page 7 Remove spaces in Table 1 Page 8 Equation 2: A bracket is missing on the right Page 9 Although ta and sa computes Although ta and sa compute Fig. 4: Would it be possible that different relations have different learning curves? When “n” is introduced, it might be helpful to discuss make a reference to how this parameter will be set. Page 10 “This is also known as the mapping selection problem.” Not the entire problem, though: the cardinality of the alignment is also considered in the mapping selection problem. However, I agree that cutoff threshold selection is one of the main components of the mapping selection process. Page 11 “The second goal of EQUATER is to address” It sounds a bit strange to read a reference to the second goal od EQUATER here. The objectives of EQUATER are well understood already. Equation 12: I think that it would be better to use 1 instead of maxE. In fact, most of similarity measures (and your measure) use values in the interval [0,1]. Page 12 “Using the annotations, we can evaluate accuracy at each rank and at certain rank the accuracy should be maximized.” This paragraph is not very clear Page 14 Better divide the descriptions of the three datasets (e.g., use subsubsections or bold emphasis). Page 15 The dataset is then randomly split into a develop- ment set (dev) containing 10 concepts for developing our measure and a test set (test) containing 30 concepts for evaluation. It is not clear why you split the gold standard if your approach is unsupervised. I guess because you need it to test s. However, this should be remarked because it is confusing to read about the spit here. Page 16 Table 5. I think it would be better to list the datasets in the same order used in which they have been introduced. Page 18 “However, as discussed before” This paragraph is quite uninteresting. I think that the main advantage of EQUATER is that it is unsupervised. Page 19 “favour recall but precision.” favour recall over precision. ? “Figure 13a compares variations of EQUATER by alternating kc functions under each thd method” —> Remind the acronyms because they were explained way above in the paper . “Firstly, under the same chosen threshold detection method, settings with kc out- perform the best baseline in most cases.” I am lost here. I followed the discussion until here but at this point it becomes difficult to recall every detail. Therefore it is difficult to follow the conclusions. Try to rephrase it. Page 20 “By analyzing the precision and recall trade-off for different kc functions, it shows that without kc, the similarity measure of EQUATER tends to favour high- recall but perhaps lose too much precision.” This is the most interesting result in this section, but it comes at the end of the discussion.

## Comments

### Additional review

The paper presents EQUATER a system which, so far, is able to match relations across different ontologies used for "annotating" the same data resources. After a review of the state of the art in ontology matching, the paper presents the measures used by EQUATER for assessing the similarity of relations, assessing confidence in the judgement and automatically determining a threshold for confidence. It also introduces briefly the use of "clustering" for improving alignments. It then goes on evaluating EQUATER by comparing the proposed solutions to various more simple versions of itself.

My opinion is that this paper describes valuable work that should be published. The three methods presented are valuable contributions in their own right and the limited evaluation show that they improve on simple methods.

However, the form of the paper is unsatisfying. Besides some elements of form which are listed below, the main problem is that the paper is presented like if its contribution was a general-purpose matching system (to be compared with other such systems), while what I see as its contribution is more restricted.

This problem can also be seen in the way "limitations of this work" are described in the paper: "Therefore, we consider the second major limitation of EQUATER as it being a partial ontology alignment method addressing a specific but practical issue - aligning relations only." This is strange because the abstract describes EQUATER as a system "to align equivalent relations across schemata based on their shared instances". So how can it be a limitation of a relation matcher to align only relations? Of course, these are limitations if the claim is that EQUATER is a general purpose extensional ontology matcher. But how could it be? What is presented in the paper is a method for matching relations within the same linked data set, a far much narrow goal, but a worthy one.

It seems that the authors indeed aimed at creating a matcher (EQUATER) but decided to publish parts of it before finishing it. This is fine as long as there is a real contribution, and there is. But this paper is still written as if it were describing a general purpose matcher. For instance, if the authors want to further extend EQUATER as they indicate in the text, then I advise not to put the name of the system in the title of the paper. Otherwise, later, if the system is successful, people will be very confused when talking about this EQUATER system. Moreover, there is no link from the paper where to find (for free or for a fee) this system, so what is the point?

This makes the paper irritating to read (see detailed comments). The first part of related work is dedicated to general purpose ontology matchers and OAEI. But if this is really related to this work, why the evaluation does not compare with such systems?

The paper could rather adopt the following line of argument:
- Ontology matching is a very important topic --and in particular in LOD--
- Most matchers are not extensional --focus on those which are, and which use LOD--
- Most matchers, as testified by OAEI, are focussing on matching concepts and neglect relations --focus on those which pay attention to relations--
- In this paper we present techniques for matching relations extensionally based on relations used in LOD.
- More precisely, we develop techniques that are used to match relations across multiple ontologies annotating the same datasets. Such techniques may be used, either for helping matching ontologies that are jointly used for annotating the same data set (and can later be used for other datasets), or for contributing matching ontologies when sameAs links between different data sets have been established.

Concretely, this would involve reducing the parts dedicated to reviewing and criticizing general purpose matchers and focussing on the techniques contributed by this work.

Describing the presented techniques as original contributions to a specific problem is perfectly acceptable and corresponds more to what is in the paper... and it would be even more valuable if techniques were available as a library that other matchers could use for improving their performances.

Details:
- p2. "who often fail to conform to a universal schema": reference needed.
- p3. [13, 41] may not be the best references on OAEI. There has been a paper in Journal on data semantics. Moreover, taking OAEI as a reference on what exists introduces a bias since only systems suited to the tests participate. For instance, EQUATER has never participated, but this does not mean that it does not exist. There has been extensional systems using LOD such as Blooms (that is mentioned after).
- p4. "It is well-known that": reference establishing this?
- p4. Not sure that the example given of foaf:Person illustrate any pathology.
- p4. "Similarity computation is typically quadratic; the more matchers...": it remains quadratic if one adds matchers. And it is not necessarily quadratic because some systems prune the search space.
- p4. PARIS does not necessarily converge in fact (no proof has been published and the authors know counter examples). Actually Paris is certainly a system that does use very similar measure as EQUATOR, even if it does only generate links between entities, for doing this he does identify relations that have to correspond, as do most key-based linked data matchers as well (see [1,2] below).
- p6. "the first that studies the problem of automatically deciding threshold". I am not a specialist, but there is a database system called eTuner and nowadays most ontology matchers determine their thresholds from the data, especially because OAEI requires that the same configuration is used in all tests.
- p6. "largely" used twice, i.e., too many times for such a vague word.
- p7. I think that the "formalization of the problem" could have come before, even before the related work.
- p7. hypothesize _that_ there exists
- p8. I would have found the description of ta as arg_{\cap}/min(arg(r1),arg(r2)) an easier to explain formulation.
- p8. Note that sa is the denominator of \beta on the divisor of \alpha: the multiplication eliminates the sub_{\cup} from the equation. The presentation with alpha beta is OK because it provides the intent, but the simplification of the formula should be given. In addition, the sentence "As a result, relations that have high sa will share many subject..." is likely incorrect: the shared objects (sub_{\cup}) have disappeared. This effect is in fact classical, when trying to mitigate coverage and functionality.
- p9. 3.2.3: the "cognitive point of view" makes a sudden apparition (single occurence of the adjective cognitive in the whole paper).
- p9. I am surprised by the use of the adjective "exponential" since the researched growth is not an exponential growth, but an assymptotic one if I understand. It is unclear to me that this adjective still can be used for the given function.
- p10. I assume that, in the definition of ta^{kr}, the second line refers to kr(|arg(r_2)|) instead of kr(|arg(r_1)|) otherwise, this would break the symmetry which was in the initial ta function and this would have to be explained.
- p10. I am surprised to see mentioned "spelling errors": the probability that spelling mistakes make relations similar should be very low unless using very few data.
- pp10-11. The techniques used there seems interesting but are not sufficiently detailed in my opinion. I have no idea what is in the references [9, 26, 35] although they are supposed to be "well-known".
- p11. In particular it is not clear what is exactly achieved with clustering. It is unclear what is done when "links may appear too weak". This seems to be an interesting technique but insufficiently described.
- p9, 12. a-priori does not need a dash.
- pp13-14. 4.5 starts again with OAEI and spends a whole column justifying why DBPedia is used. The text "Although DBPedia... of the problem" should rather be in introduction of the paper than here. It is time to present an experiment.
- pp13-14. The use of "the problem" is unclear. It is not clear if this denotes the same problem each time and it would be better to qualify the problem ("the problem of ...") to be sure of that.
- p14. It is strange to separate a dev set and a test set when the algorithm does not do learning. But OK, the intent is clear. The whole 4.5-4.6 is difficult to read, but I have no improvement to offer.
- p14. "cP concatenating" unsure what it means. Since the elements of P are not concatenated pairs but simple pairs, it is dubious that "$cP\subset P$".
- pp15-16. 5.1. It is difficult to understand the reason for this section. In a result section, the space should not be devoted to criticize the gold standard, but to discuss results. The discussion is interesting but should have occurred before, either to show that the task is difficult or to discard this gold standard, it is not time anymore to do this. The analysis of these mistakes is interesting but should better be part of data preparation instead of results (unless, I misunderstood something, up to you to avoid such misunderstanding).
- p15. "The data set is overwhelmed by negative examples": if these are mistakes, it would be better calling them this way (negative example could well be counter-examples and/or true negatives which are not mistakes).
- p16. Table 5. Reporting F-measure and recall in the same table is misleading. Reporting recall when it is available, would help comparing. There is no point at comparing F-measure and recall.
- p16. "confirm the hypothetical analogy": how can an analogy be confirmed?
- p21. "aligning heterogeneous resources": please be precise. "Currently, EQUATER fits best with aligning relations from different schemata used in a single Linked Dataset": actually, if this paper is about EQUATER, then this is exactly what it does. If EQUATER is a moving target, then better not write papers about it.
- pp22-24: some references have firstname, some other don't
- p22. [13]: This is "matchinG" and there is a new edition in 2013
- p22. [9] has no publisher, difficult to find out.
[20] likely need {F} around the F.

Refs.

- Danai Symeonidou, Vincent Armant, Nathalie Pernelle, Fatiha SaÃ¯s: SAKey: Scalable Almost Key Discovery in RDF Data. Semantic Web Conference (1) 2014: 33-49
- Manuel Atencia, JÃ©rÃ´me David, JÃ©rÃ´me Euzenat: Data interlinking through robust linkkey extraction. ECAI 2014: 15-20