A meaning-based algorithm for ontology matching

Tracking #: 931-2142

Authors: 
Isaac Lera
Carlos Guerrero
Carlos Juiz

Responsible editor: 
Guest Editors Ontology and Linked Data Matching

Submission type: 
Full Paper
Abstract: 
Ontology Matching Algorithms establish a relationship, namely alignment, between two elements of different ontologies. The efficacy of the process has implications in other semantic processes such as retrieving information, storing data, inference mechanisms, ontology versioning, and so on. Ontologies are domain representations in a context. We claim that this context should be taken into account to provide accurate alignments. We present an Ontology Matching algorithm that integrates a Word Sense Disambiguation process of OWL classes and provides semantic alignments using an extension of OWL constructors. The disambiguation process builds a network of words and links, using external thesaurus as WordNet and Roget. We use the network like a representation of our context for the selection of the meaning. We use multiple techniques based on the application of specific rules and a combination of weighted terms, frequencies and type of links using the network of data. In the evaluation phase, we compare our model with two benchmarks: one regarding with the disambiguation of classes of ontologies and the other one using Ontology Alignment Evaluation Initiative datasets. The quality and number of alignments show a slight improvement.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 09/Jan/2015
Suggestion:
Reject
Review Comment:

The paper proposes a two-step ontology matching approach: first words are disambiguated thanks to WordNet and then concepts are matched across ontologies. The paper is relevant to the special issue. The proposed technique is of unclear novelty as authors do not properly highlight its novelty aspects as compared to referred previous works. The evaluation results are obtained with standard evaluation collection, when possible. However, results are modest and the proposed approach does not improve over state-of-the-art approaches for ontology matching. The proposed matching approach is based a list of heuristics. The experimental setup uses standard collections but it is not very clear how parameters have been set. There is not experimental evaluation or discussion about efficiency aspects of the proposed approach. The main problem of the paper is its writing style: The paper is very hard to read and understand. In my opinion, the writing should be extensively improved for the paper to be acceptable.

In summary, this paper is hard to read and seems an incomplete work.

Detailed comments:
- Abstract, "a slight improvement" -> over what?
- Intro, "the result is a range with decimals" -> unclear
- Intro, "is different from another set" -> unclear
- Intro, “lets” -> let’s
- first equation in Related Work should be “\in [0..1]” and not “= [0..1]
- Related work, more detailed comparisons of your work with previous works using WordNet [42,5,61] seem necessary to highlight the novelty of your approach. In the current paper it seems that you only have a couple more rules as compared to previous work.
- Related Work. Table 3 and 4 are interesting but are missing a discussion: You should summarise the main facts in the table as, for example, many approaches use WordNet, many approaches are evaluated on OAEI, etc.
- Related Work. “are the based”-> “are the base”
- At the end of Section 3.1 it would be useful to add an example of network.
- Section 3.1.1, “The bigger number of patterns are established the bigger number of senses are selected without error.” -> why?
- Section 3.1.1, it is unclear to me how simple words are split.
- Section 3.1.2 “they have been fixed in practical experiments” -> this is not sufficient. You need to explain which train and test data you used to set these weights.
- Section 3.2 It is unclear how network overlap is computed.
- Section 5, why did you select the Conference dataset? Please, motivate your choices. An experimental comparison on more then one dataset would make the results stronger.
- Table 2, “2011.5” -> what does it mean?
- Section 7, “weighted TF-IDF formula” what does this refer to?

Review #2
Anonymous submitted on 23/Feb/2015
Suggestion:
Major Revision
Review Comment:

This article describes an Ontology Matching algorithm that uses WordNet and Roget thesaurus as background knowledge. The preliminary step is a word sense disambiguation process that associates WordNet-based senses to the ontology entities. Next, a network of words and links is created that is used as context representation for meaning selection in the matching step. An extended OWL representation is used for representing the alignments, based on OWL-M.

The approach is interesting, although the net contribution is minor, in my view. The structure and organisation of the paper is correct. However the English is poor, with frequent typos and grammar mistakes. I strongly recommend its review by a native speaker. The paper contains some interesting ideas such as the use of OWL-M for representing the annotations and mappings, and its ability for discovering relations other than equivalences.
The review of the state of the art is quite extensive. However it does not come with a good analysis to help to motivate the proposed approach. The evaluation section has some flaws (see below). Here are some additional comments, per section:

Section 1.
* "are related by a common similarity" -> "are similar".
* "The characteristics of the languages..." natural languages? or knowledge representation languages?
* "We consider that a greater set of measures..." greater that what?
* "...since it decreases the disambiguation of the alignments..." -> do they refer to the "ambiguity" of the alignments?
* The definition of "semantic measure" as "the meaning of an ontology element under a context" is not very precise, and do not match with the notion of "semantic measure" in the literature.
* What is Q in Jaro-Winkler?
* "the most indicators, the best results." -> "the more indicators, the better results."
* "alignment between initial words" -> "alignment between the initial words"
* The introductory section fails to answer the following questions: what are the main contributions of this work? why is it novel?

Section 2.
* Te proposed definition of ontology matching is poor, I recommend to reuse any other from the literature
* "two levels of classification about matching processes" -> "two levels of classification of matching processes"
* "Giunchiglia et al. [23] is one of the first efforts in..." -> "The approach of Giunchiglia et al. [23] is one of the first efforts in.."
* "These works are the based for our definition of..." English is not correct, this has to be rewritten.
* I miss an analysis of the SoA at the end of the section, discussing the relation of the studied approaches to the proposed one, as well as their limitations.

Section 3. In this section the core process is explained, starting with the WSD algorithm and followed by the matching step.
* "find out the best one sense of a class" -> "find out the best sense of a class"
* The definition of "concept" as an ontology class + its label is confusing. Later it is said that the concept is a "set of words" (the sets of words of the label? or something else?).
* What do they mean by "semantic word"?
* They state that "The bigger number of patterns are established the bigger number of senses are selected without error" but this has to be justified (wrong patterns could be identified).
* "If two classes are equivalents" -> "If two classes are equivalent"
* "When humans tries to describe" -> "When humans try to describe"
* In general I miss an overview paragraph at the beginning of the section.

Section 4. The representation of the alignment is discussed in this section, based on owl-m constructors.
* A pointer to the online version of owl-m is missing.

Section 5. Two experiments are described in this section, one to test the WSD part and the second one for the matching algorithm.
* For the first experiment, a dataset was created with manual WordNet-based annotations of several ontologies. However the authors have not provided any pointer to the dataset, so it cannot be further inspected and the experiment cannot be reproduced.
* Some important information is also missing, such as the number and profile of the annotators and the inter-annotator agreement level.
* The definition of "precision" is wrong: "number of correct alignments" -> "number of correct alignments divided by the total number of obtained alignments".
* Citation or footnote to OAEI Seals is missing.
* The description of the matching experiment in Section 5.1 is confusing. The numbers of the "conference" track (e.g., 51.1 classes) are in average?
* Some examples of "misspelling labels" and "new discovered classes" could be added for the sake of clarity.
* The success ratio of senses is 0.5048%. What does it mean?, why so many decimals? why is it so low? (less than one percent)
* Some values in Table 2 do not match with the values in the main text (e.g. f1 value and recall).
* Last paragraph starting "We get..." is difficult to understand. The potential increase in precision and recall is not well explained.

Review #3
By Catia Pesquita submitted on 22/May/2015
Suggestion:
Major Revision
Review Comment:

The paper describes an approach for ontology matching that is based on a number of pattern-based rules that explore thesaurus relations to infer the meanings of the words composing ontology classes labels. I find the idea interesting, but the paper does not provide a proper motivation, it fails to describe the methodology with sufficient accuracy, and the evaluation does not really showcase the usefulness of the proposed method. Moreover, the writing is at times very confusing, with many typos and grammar mistakes that hinder readability.
The paper needs to be thoroughly reviewd and several sections rewritten for clarity.

Major issues:

1. The paper needs a stronger motivation. In page 2: "We consider that a greater set of measures does not
guarantee a successful comparison of equivalence between
two elements. But the results are more accurate
if we use measures derived from all the conceptual
rules since it decreases the disambiguation of the alignments."
This notion should be more deeply explored, perhaps with concrete examples.

2. There is a difficulty in handling the concepts of "class", "concept" and "label". In page 4: "We suppose that a class has only one meaning under a domain. Thus, the problem is to find out the best one sense of a class". This statement is inaccurate, since in a well designed ontology the meaning of a class should be inferrable by looking at a classes properties and relations. An ontology class has only one sense, a "label" may have multiple. Furthermore, referring to a class label as a concept is erroneous. Concepts can be described by labels, but they are not the labels themselves.

3. There are multiple examples throughout the paper that seem unrelated. It would be best to have one or two coherent examples that you refer to when needed.

4. The explanations of the algorithm are very confusing. In Section 3.1.2 it is particularly hard to discern how the selection of the appropriate sense is arrived at. In Section 3.2 it is very hard to understand how exactly mappings are computed and scored. These need to be thoroughly rewritten and possibly with a concrete example throught the explanation.

5. The evaluation does not adequately support the proposed approach. The authors themselves state taht "we could argue that OAEI catalogues are rarely ambiguous." Then why use the conference set for the only evaluation presented? It would be best to complement this benchmark evaluation with some specific examples of mappings captured by your WSD based approach, that cannot be captured with SD.

6.The role of WSD in the mapping strategy is not adequately addressed and discussed.
How is the "success ratio" for automated meaning discovery calculated? If the percentage means the number of classes for which the correct sense was identified, aren't 0.5% and 0.4% values really low? And if this is a typo, and you meant 40 and 50%, there are still not very high. The impact of these success rates in your matching approach should be discussed.

Minor Issues:

1. Authors claim to have reviewed the latest approaches in OM since 2003. This is not acurate, since the most recent systems mentioned are from 2011, four years ago.

3. This example " House_Mansion ≡ House1 : {S1 = ”the house...”, S2 = ”a place...”}." is perhaps not the best, since "a place" would actually be a hypernym of house? Looking up "house" in WordNet does not return "a place" as a possible synset.

4. In page 4, the "8x8" combinations are unclear. The explanation given later, should be moved to page 4.

Some typos:

"The network of terms regarding with a class" -> The network of terms regarding a class
"In any way, It seems that both words make reference to barriers" - > It seems...
"Moreover, the algorithm can
work with two external resources: WordNet and Roget’s
thesaurus. And it works using a set of specific
rules." - > Moreover, the algorithm can
work with two external resources, WordNet and Roget’s
thesaurus, and it works using a set of specific
rules.