Quality-Based Model For Effective and Robust Multi-User Pay-As-You-Go Ontology Matching

Tracking #: 893-2104

Authors: 
Isabel Cruz
Francesco Loprete
Matteo Palmonari
Cosmin Stroe1
Aynaz Taheri

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Full Paper
Abstract: 
Using a pay-as-you-go strategy, we allow for a community of users to validate mappings obtained by an automatic ontology matching system using consensus for each mapping. The ultimate objectives are effectiveness—improving the quality of the obtained alignment (set of mappings) measured in terms of F-measure as a function of the number of user interactions—and robustness—making the system as much as possible impervious to user validation errors. Our strategy consisting of two major steps: candidate mapping selection, which ranks mappings based on their perceived quality, so as to present first to the users those mappings with lowest quality, and feedback propagation, which seeks to validate or invalidate those mappings that are perceived to be “similar” to the mappings already presented to the users. The purpose of these two strategies is twofold: achieve greater improvements earlier and minimize overall user interaction. There are three important features of our approach. The first is that we use a dynamic ranking mechanism to adapt to the new conditions after each user interaction, the second is that we may need to present each mapping for validation more than once—revalidation—because of possible user errors, and the third is that we propagate a user’s input on a mapping immediately without first achieving consensus for that mapping. We study extensively the effectiveness and robustness of our approach as several of these parameters change, namely the error and revalidation rates, as a function of the number of iterations, to provide conclusive guidelines for the design and implementation of multi-user feedback ontology matching systems.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 07/Apr/2015
Suggestion:
Minor Revision
Review Comment:

I reviewed an earlier version of the this paper for the jopint submission process of EKAW 2014 and the SWJ. Most of my comments on that version are still valid. I therefore included that review below.

I have to note that my comment on section 2 has not been taken into account and that part is still rather lengthly and should better be used for a more in depth comparison with related work.

One thing that clearly needs to be clarified by the authors are the decisions to use a different quality measure for the evaluation that completely changes all the graphs compared to the original EKAW submission. Further, this version of the paper exchanges one of the quality criteria. In particular it exchanges 'Propagation Impact' with Feedback Stability'. While changing the measures used is of course ok, I miss a comparison with the previous approach and a proper argumentation why the new aspect was included - and even more important, why the Propagation impact has been removed!

If the authors can clarify these issues I think the paper can be accepted.

--
Previous review:

The paper presents new heuristics for guiding interactive ontology matching. The idea of the approach is derived from the concept of active learning where specific requests are generated for a human expert to help the learning process. In a similar way, the authors propose several measures that
determine the assumed quality of a mapping hypotheses and select mappings for human inspection based on these measures.

The measures are based on the disagreement and other statistical properties of mapping hypotheses generated by different matchers and an approach for feeding the result of human feedback back into the mappings and the resulting measures are provided. The measures make a lot of sense, for a journal paper, however, I would have expected a deeper analysis of the contribution of the individual measures. Adapting the measures based on the users feedback is also a good idea. What I miss is the integration of definite feedback - in particular as the 1:1 assumption is made, a human feedback completely excludes some mappings from the search space (see also the work of Meilicke and others on interactive mapping revision*).

The evaluation is quite thorough and does not only cover a comparison of different parameter settings, but also a comparison with competing systems, in particular with Shi et al (reference [4]). As this systems seems to define the state of the art in this field, I would have expected a more detailed
comparison with this approach and with active learning in general.

Overall the idea of the paper presents some good work. The
paper starts a bit lame with too much talkshop. I consider section 2 to be completely redundant. This space could be used in a better way by providing more technical discussions and a deeper comparison with existing approaches. Overall, a good
paper.

* Christian Meilicke, Heiner Stuckenschmidt and Andrei Tamilin. Supporting
Manual Mapping Revision using Logical Reasoning. In: Proceedings of the
Twenty-Third AAAI Conference on Artificial Intelligence and the Twentieth
Innovative Applications of Artificial Intelligence Conference : 13 - 17 July
2008, Chicago, Illiois, USA; 1213-1218. AAAI Press, Menlo Park, 2008.

Review #2
By Stefan Schlobach submitted on 02/Jul/2015
Suggestion:
Minor Revision
Review Comment:

The paper introduces an approach for semi-automatic ontology matching or, to be more precise, the integration of user feedback into an interactive process to improve automatic mappings.

This is an interesting and challenging problem at the core of both knowledge management and the Semantic Web. The overall quality of the paper is high, an interesting theory is worked out in sufficient detail, well described and evaluated. So I would recommend to accept the paper with some minor revisions.

I think that there is good contribution beyond the state of the art, and a well written and structured document.

Those revisions are primarily some minor restructuring as well some more additional explanation of issues that I think are insufficiently worked out. I name those issues in no particular order.

In the abstract, the authors mention that the mappings with lowest quality are presented first to the users (something that comes back in step3). I have two problems with this, first, I think that in Active Learning the criteria for inclusion is not necessarily quality, but rather the one that helps improving the mapping, which I believe is not the same. In section 3.2, this is worked out far more detailed, of course, and the "quality measure" used is now disagreement etc. I suggest that the abstract and step3 are reformulated to avoid the impression that "quality" of the mappings (whatever that may be) is what is used for the ranking of the suggestions.

A second core ingredient that I find confusing is the notion of similarity: you state that the first step is to calculate a similarity matrix. But what do you mean by similarities? I would expect a similarity relation to be reflexive and transitive, which is surely not the case for your Matrix in Table 1. I could understand the use of the term similarity if the matrix was somehow symmetric and reflexive, but this is not even remotely the case.
This raises two questions: 1) whether it is a good idea to use the term similarity here, and 2) whether it is good to use those measures in the first place? The point is, I would guess, that the real results from mappers are producing values that are "not very good" (whatever that means). So, it might be that it is a valuable goal to produce a good similarity matrix from matrices as the one of Table 1. But then you have to be more clear of what those values are really supposed to mean. In the worse case the values returned by automatic tools are confidence values, rather than similarities. But I guess you are careful here (although this is not mentioned). More problematic, I find, is the fact that you use a single tool for creating your evaluation set: here a single notion of similarity is used, so results might be completely different in case different mapping tools provide "similarity values" with a different interpretation of the term similarity.
I do not want to suggest that the arguments and subsequent model are incorrect, I just think that the rather critical notion of similarity has to be discussed with more care.

A different point concerns the term " pay-as-you-go ", which has a prominent role in the title and abstract of the paper. Unfortunately, the term is not discussed wrt the method and the model. Although I understand the general idea why this term is used, I think this apparently essential contribution of the paper needs to be given a more prominent place, e.g. in section 3, but also in the evaluation. In section 3, it would be good to know how the design choices of the model are appropriate for an anytime approach. And though I understand that there is an evaluation about the robustness and quality over time I think it could be more clear what the results tell me w.r.t. the anytime. As an example, I think that one of a desirable property of an anytime approach has to be that it improves a lot within few iterations, and then maybe takes time to converge toward perfection. This kind of discussion would have deserved a more prominent place.

On a different note, I would appreciate more explanation on some of the design choices: to me a very interesting aspect of the approach is the potential to study various reconciliation methods, but then the only method is simple majority vote . This seems like a missed opportunity.

Another missed opportunity seems to me that the approach is not tested at all w.r.t. the real users, for example those from the geospatial domain. I know that user testing is painful, and expensive, and I think that the paper is publishable even without these kind of tests. But I think it would have been very useful to validate the assumptions underlying the experiments (as the ones you point out on page 9). It seems that in an ongoing research project, it does not seem impossibly hard to get concrete qualitative (even if small) data for such a test. It would be good at least to comment in the paper on the potential of such a test.
More structurally, I think that section 2 should be improved: it starts with a description of what you do not do, before you make clear what you actually do, or want to do. Instead you start with very specific problems (p3 first column), instead of introducing the problem you want to solve, and to give a global overview of how to solve the problem. The latter is there (in the 7 steps), but the first column of the section is rather confusing.

Finally, as a minor comment, I think that there is an overuse of Capitalization, and no consistent way doing this: you have CON and similarity score defined without capitals (p5), but Disagreement and Indefinitveness Average with capitals. Please try to be consistent here.

Small things I stumbled across:
- p9, Our focus is on the evaluation of the methods minimize -> non-grammatical sentence
A Constant

So, to summarise: a nice paper, which solves a challenging problem in an interesting way, and provides some useful insights empirically, should in my view be accepted for publication, once the authors have discusses some critical notions, such as quality of mapping, similarity, etc a bit more systematically.