Tversky’s feature-based similarity and beyond

Tracking #: 1740-2952

Authors: 
Silvia Likavec
Ilaria Lombardi
Federica Cena

Responsible editor: 
Lora Aroyo

Submission type: 
Full Paper
Abstract: 
Similarity is one of the most straightforward ways to relate two objects and guide the human perception of the world. It has an important role in many areas, such as Information Retrieval, Natural Language Processing (NLP), Semantic Web and Recommender Systems. To help applications in these areas achieve satisfying results in finding similar concepts, it is important to simulate human perception of similarity and assess which similarity measure is the most adequate. In this work we wanted to gain some insights into Tversky’s and more specifically Jaccard’s feature-based semantic similarity measure on instances in a specific ontology. We experimented with various variations of this measure trying to improve its performance. We propose Sigmoid similarity as an improvement of Jaccard’s similarity measure.We also explored the performance of some hierarchy-based approaches and showed that feature-based approaches outperform them on two specific ontologies we tested. We also tried to incorporate hierarchy-based information into our measures and, even though they do bring some slight improvement, it seems that it is not worth complicating the measures with this information, since the measures only based on features show very comparable performance. We performed two separate evaluations with real evaluators. The first evaluation includes 137 subjects and 25 pairs of concepts in the recipes domain and the second one includes 147 subjects and 30 pairs of concepts in the drinks domain. To our knowledge these are some of the most extensive evaluations performed in the field.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Jérôme Euzenat submitted on 07/Nov/2017
Suggestion:
Major Revision
Review Comment:

I am still unconvinced by the paper as revised; still for reason of form, though it is unclear how the feature part works. The point is that the authors did apply minimal adaptation of the paper instead of thinking about what were the reasons of these comments. Hence comments made about Tverski apply to Jaccard.

I am aware that this is a negative statement.
I am fine with editor overruling in favour of publishing the paper.
I hope that the remarks here could help the authors improve their paper.

* Tverski

There is a kind of style that was in the first version and still remained.
This style is about discussing something under an "umbrella name" that is not relevant.
It was the case, in the first version of the paper with the reference to Tverski.
The reference has been tuned down, but it is still there, right in the abstract.
It has also been replaced by Jaccard.
Everywhere in the paper is mentioned the point that the paper is about "an improvement of Jaccard".
In fact, Sim_{JS} may eventually be called so, because it uses Jaccard ---not really improves, simply reuse--, but Sim_{S} has nothing to do with Jaccard.
The Jaccard similarity is basically characterised by a simple structure: it divides the cardinal of the intersection between two sets by the cardinal of its union.
That's simple and elegant.
An extension to bags or to weighted or fuzzy sets, may still be called an extension of the Jaccard similarity.
There are many such extensions around.
The Tverski similarity has never been sold, by Tverski, as an extension of the Jaccard similarity.
Actually after returning to Tverski's paper, he seems totally unaware of proper set-based similarities although this is what he introduces, and was only considering affine space distances.
The reason is that the innovation introduced by Tverski, inspired by Rosch, is a weighting that breaks the symmetry and that the argumentation of Tverski is that this is how a similarity should be: not like Jaccard!

So, for instance, on page 11, the statement "we confirmed our hypothesis H2 that it is possible to improve the original Jaccard's formulation of similarity measure" sounds really strange since the improved measure is just another measure.
The hypothesis should rather be "Jaccard is (not) the best measure".

In the end, the whole new first paragraphs of the conclusion are affected by this problem.

* OWL and features.

There is also a discrepancy between the point made in Section 2.3 that the paper is restricted to object properties and the link with Tverski measures which where based on boolean features. It seems that the whole paper is related to such features and has nothing to do with OWL, but the experiments may have.
In particular, Section 3 comes back on properties and turning them into features:
"We will include in common features the cases when (sic) the two objects have the same value for the property p. We will include in distinctive features of each object the cases when the two objects have different values for the given property p."
The problem is that in OWL and more generally in description logics or in RDF, a property p may have several values.
Hence, objects o and o' may respectively have v and v' and v and v'' as values for p.
We end up with the possibility that a property p is both a common feature and a distinctive feature according to this definition.
The whole subsection is not clear: CF_p, DF^1_p are described at "how it contributes to common features, distinctive features", but it is not really explained how: what is the range of these functions? {0, 1}?
They are also ill-defined because DF^1_p depends on O_2, but this is not made explicit.

One natural way to reconcile this would have been to consider the pair property-value as a feature.

The justification of why datatype properties are not considered is also quite unclear.
It is clearer in the conclusion, but the statement is still strange: actually datatypes come with an equality notion built-in, this is one feature that make them different from objects. Hence no further research should be needed for that.

* Contribution

The "main contributions" have been reduced to two. As already mentioned in my first review, it is unclear that it is wise to introduce both of them together.

The Pearson correlation in the results is also very irregular, from data set to data set, the order between measures according to this correlation is different. This order is the reason to draw conclusions, so these do not seem very solid.

Finally, I insisted on having the measures compared with classical results on wordnet. So I was unhappy to see the N/A in Table 11: I was expecting to have results. This is only by reading the answer to reviewers that I understood that the authors are right that WordNet is not suitable for the feature-based measures. Please be sure that this is clearly understandable by everybody (I would put the explanation in the caption of the table). People will not have the answer to reviewers (I know that this explanation is also written in the text, but what happened to me should happen for other people).

* Other

The introduction has a convincing justification: it is necessary to go "beyond wordnet".
But I miss the step that make go from there to the need for similarity extensions.

It is surprising that in the set of selected classes, there is RollDish, CrepeDish, PancakeDish and WaffleDish, which could be considered to very close with respect to the others.

It is written in several places that what is important is to simulate "human perception of similarity", but it is not written why.

Section 3.4: it would be as nice to introduce here some criticisms of feature-based similarities as well.

As mentioned in my previous review, Sim_S and Sim_{JS} do not span [0, 1]. In spite of the answer to reviewers, this is now mentioned in the text (4.1.1). This may actually be a problem to compare similarities which do not have the same range -- and scaling them to the full range is easy.

* Details:
- p1: "the word with the similar meaning": synsets are sets of synonyms. This is clear and simple. Writing about "the meaning" entails that there is only one, and the they are not supposed to be similar, they are suppose to be the same.
- p2: "various variations" different / several / many / just variations?
- p2: a conceptual hierarchy "a tree or a lattice". Why not simply a directed acyclic graph?
- p9: "slight modifications" it is unclear that adding a class hierarchy can be qualified of slight.
- p13: "cannot confirm": "cannot reject the null hypothesis", "cannot accept" would be more appropriate.

Review #2
By Pasquale De Meo submitted on 16/Nov/2017
Suggestion:
Accept
Review Comment:

The paper has been significantly revised and it appears in a good shape for getting published. Some sentences are, however, too long and thus some parts are a bit hard to follow. I recommend to simplify some sentences (by breaking long sentences in smaller ones).

Review #3
Anonymous submitted on 25/Nov/2017
Suggestion:
Major Revision
Review Comment:

In general, I both got confused by some of the comments that were describing the goals and approach and got intrigued by the subject of finding an effective measure.
The confusion has to do with the objective of the work: it feels like sometimes it discusses general solutions, and sometimes it discusses situations that are clearly specific and therefore ask specific solutions.
I also was confused by the focus on Jaccard’s measure, since it radiates a story like “Jaccard works, let us make it better”, without convincingly stating how Jaccard is in fact the optimal solution to the problem: maybe completely different solutions work even better.
I got intrigued by the ambition and I like the way things were explained with the evaluation.

Therefore, I like the idea but recommend that some serious changes are applied to the text. By toning down the general ambition and staying closer to the problem that is actually discussed in the evaluation, ambitions and results will be more accurate, even when smaller in significance.

Detailed comments:
Do the authors want to add a question mark at the end of the title?
One of the first things that strike me in the opening is that similarity is framed as something absolute. The authors do speak about perception, and I would expect that similarity is related to perception and is in the eyes of the beholder.
I am confused by the motivation to looks at Tversky’s and Jaccard’s measures: if they are not first established as relevant points of departure for considering similarity, then it feels odd to start from there in order to consider similarity, which was first said to be the object of interest.
What I miss from the beginning is the clear definition of what, for the set goal to have a good notion of similarity, is indeed ‘good’.
I got confused with some of the comments in the introduction, as to why ontologies are necessarily in play or how measures should be general. The authors mention that it is not very clear which measure to use in a given situation, so I got confused as how to properly measure the measure. I got the impression that the idea for this work relies on the dependence on conceptual hierarchies, but I was not convinced whether this is part of the problem to solve or the solution.
Section 2 does give some background, but it feels that either this could have been done where the concept was first used or that there would be a longer and more complete coverage of relevant concepts from literature.
I like the detailed introduction of the measures in the first three parts of section 3. The discussion that follows in 3.4 feels a bit arbitrary, since in my understanding the pros and cons would depend on what you would want to do with the measure definitions. The discussion seems to go more into their implementation, but without regarding the context and purpose of usage.
The opening sentence of 4.2 shows what is apparently the basic question. It confirms that the actual interest is mostly on the influence of the hierarchical information on the measures: of course, these hierarchies exhibit their own specific properties.
The opening statements in 5 fuel my interpretation that authors are interested not in the traditional notion of similarity like in the experiments from the 60’s they mention, but in a different notion that directly relates to the ontologies and their properties.
I like the detailed account of the evaluation. Given what the authors intended to do, it appears to cover most of what should be there.