Measuring similarity in ontologies: a new family of measures

Tracking #: 720-1930

Authors: 
Tahani Alsubait
Bijan Parsia
Uli Sattler

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
Without a doubt, similarity measurement is important for numerous applications (e.g., information retrieval, clustering, ontology matching). Several attempts have been already made to develop similarity measures for ontologies. We noticed that some existing similarity measures are ad-hoc and unprincipled. In addition, there is still a need for similarity measures which are applicable to expressive Description Logics (DLs) (i.e., beyond EL) and which are terminological (i.e., do not require an ABox). To address these requirements, we have developed a new family of similarity measures. To date, there has been no thorough empirical investigation of similarity measures. This has motivated us to carry out two separate empirical studies. First, we compare the new measures along with some existing measures against a gold-standard. Second, we examine the practicality of using the new measures over an independently motivated corpus of ontologies (BioPortal library). In addition, we examine whether cheap measures can be an approximation of some more computationally expensive measures.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] conference only accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 22/Aug/2014
Suggestion:
[EKAW] combined track accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

2

Reviewer's confidence
Select your choice from the options below and write its number below.

4

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

5

Novelty
Select your choice from the options below and write its number below.

4

Technical quality
Select your choice from the options below and write its number below.

4

Evaluation
Select your choice from the options below and write its number below.

4

Clarity and presentation
Select your choice from the options below and write its number below.

4

Review

The authors describe their work on specification and application of a set of semantic similarity measures. The work is well placed in the context of previous work, and adequately described.

There are a few minor issues that I would like to see addressed.

There are 19 out of 30 concept pairs used, but the authors do not describe which ones, nor which concept pairs (or concepts) were excluded.

The definition on the bottom of 5 shows that the measures depend on the language of the signature. This seams odd, as the signature only contains concept names, not their definitions. Either I don't understand, in which case some more explanation would be helpful, or this should be corrected.

Explain the names SubSim and GrSim.
Sub apparently refers to a subset of interest, Gr seems to refer to grammar (in experiment 2 it is referred to as grammar-based pairwise similarity; this should be mentioned when GrSim is introduced).

"intensional vs. extensional based measures"
I suggest either removing "based", or changing to intension vs. extension based measures

Page 6:
The sentence starting with "we cannot conclude" should be a follow-up of the following sentence , starting with a comma.

Page 7:
typo in subsoncept

"SNOMED CT" should be written without hyphen

I wonder about the stability of the similarity measures. Authors choose to construct a random subset of concepts. I can imagine other subsets giving different results. This should be addressed. Ideally by generating multiple random sets and running the experiments over all the random sets; alternatively by discussing any impact the random selection may have.

Remove the mention of average time, which is a useless measure in such skewed data, as is shown by the difference between average and median, and a sd which is 4 times larger than the average.

rephrase "we want to find out how frequently can a cheap measure be ..." (.e.g., ... how frequently a cheap measure can be ...)

Review #2
Anonymous submitted on 28/Aug/2014
Suggestion:
[EKAW] reject
Review Comment:

Overall evaluation Select your choice from the options below and
write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

-1

Reviewer's confidence Select your choice from the options below
and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

5

Interest to the Knowledge Engineering and Knowledge Management
Community Select your choice from the options below and write its
number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

5

Novelty Select your choice from the options below and write its
number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

3

Technical quality Select your choice from the options below and
write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Evaluation Select your choice from the options below and write its
number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

3

Clarity and presentation Select your choice from the options below
and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Review Please provide your textual review here.

summary

The paper presents three similarity measures for ontologies
utilizing TBox information only. After introducing the basics and
discussing related work, baseline measures are introduced. Then,
the new similarity measures are presented and along two empirical
studies these measures are evaluated. The paper closes with
discussing limits and an outlook.

Overall the paper is well written and good to read. The made
contribution is clearly stated and it seems to be technically
sound. Sometimes the made claims are strong and unfocused which
needs to be corrected. The proposed measures are well integrated
into a larger framework and the discussion of their relationship
is well made. Another big plus is the evaluation/discussion of the
measure relationship during exp. 2. Weak points are the related
work which address only a very small part of all existing work and
exclude very important work from the evaluation. In addition the
evaluation of the method utilize only 19 concept pairs which is
not much.

While I think the work is quite nice and addresses a very
interesting and important topic I can't accept it due to mentioned
weak points.

more comments:

The first thing I would like to stress is the way claims are made
in the introduction. Let me try to explain this by quoting the
following sentence from the frist page: "To date, there has been
no thorough empirical investigation of similarity measures.". Such
a sentence means to me that all other researcher which works on
similarity measures have never made a serious and complete
empirical investigation of similarity measures. Given your work, I
would say that your paper have also not reached this goal. In
addition there is work which compares a bunch of similarity
measures. Work from colleguages like "Budanitsky, A. & Hirst, G.
(2006), 'Evaluating WordNet-based Measures of Lexical Semantic
Relatedness', Computational Linguistics 32 (1) , 13--47." go a
step further and do a human grounding. This work is completely
ignored even they made a complete comparison of similarity measure
for wordnet relaying on similar information as your work (I agree
that you go beyond, but not much). For the next time, I suggest to
come up with a weaker claim which reflects better the presented
and existing work and which are more serious. Please revise the
first two paragraphs (there are more very general and not very
nice phrases like the mentioned one) of the introduction to
address this issue.

Another set of work which is not even touched in you paper are the
once using the WS353 dataset:

http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/

There are a big number of work for similarity measures and most
might be not relevant as it utilize the ABox. But as your work is
so "thorough" I expect that all bigger research areas for
similarity measures will be mentioned.

Fig. 1: all the proposed measures follow a similar pattern
compared to Wu and P. measures. Any idea why? How about leaving
out some of the concepts to test the influence of the structure on
the measure. This was one critics on ABox based measures and I
think it hold here as well and such a test will answers the
question: What happens if the ontologie is not complete? How
sensitive is your measure on the current ontology? Maybe it gives
an inside into common pattern of all the measures.

Further, exp. 1 is the only real evaluation as exp. 2 does not
include baselines nor does it compare with human information. This
brings me to my most critical question. Who decides which concepts
are similar and to what extent? My guess would be that a human
makes this decision but then 19 concepts are not very
representative. Ok, you mentioned the limits of the results at the
end of your work but how is this limit related to the promised
very complete empirical investigation promised in the
introduction? And exp. 2 is only an internal comparison of the new
measures without any human grounding. So this part is more an
discussion than an evaluation.

Lets go back to example 1. It is stated that "Sim(Carnivore,
Omnivore) > Sim(Carnivore, Herbivore)" for technical reasons. Is
this also true for humans? In any case? I suggest to extent the
example and discuss the issue of similarity and the goals your
would like to reach with your work w.r.t what similarity means.

At the end of sec. 5 two IC measure are introduced which are later
not used to compare with. While it might be that Rada has
sometimes outperformed these measures it would be nice even for a
journal publication to include at least one measure of this
family. In addition it turns out that the measure of Jiang&Conrath
in combination with wordnet is the best similarity measure in the
above mentioned study of Budanitsky and Hirst. Therefore, I expect
results for this measure as well.

Parameter are always tricky. Why is a delta of 0.1 in sec. 8 a
good number?

Review #3
Anonymous submitted on 03/Sep/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
x 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
x 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
x 4 good
== 3 fair
== 2 poor
== 1 very poor

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
x 4 good
== 3 fair
== 2 poor
== 1 very poor

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
x 4 good
== 3 fair
== 2 poor
== 1 very poor

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
x 4 good
== 3 fair
== 2 poor
== 1 not present

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
x 3 fair
== 2 poor
== 1 very poor

Review:

The paper proposes an interesting approach to measuring similarity in ontologies. The main idea is to count common subsumers and distinguishing subsumers between two concepts in an ontology. The proposed approach has been evaluated by several experiments.

The presentation of the paper is needed to be improved. The paper suggests to use Sub(O), the set of concept expressions in the ontology O, to measure the similarity. However, the set of concept expressions is usually infinite. Although the paper claims that we should select only a finite set of concept expressions, it is still unclear for me how to make the finite set of concept expressions from an infinite set.

The proposed similarity measure seems to be language-dependent. Just consider an ontology language which expresses the concept hierarchy only(i.e., without the universal quantifier, the existant quantifier, and others). Two sibling concepts at the same concept hierarchy have the same subsumers. It seeems to be counter-intuitive that all sibling concepts have the same similarity degree.