Quality-Based Model For Effective and Robust Multi-User Pay-As-You-Go Ontology Matching

Tracking #: 761-1971

Authors: 
Isabel Cruz
Francesco Loprete
Matteo Palmonari
Cosmin Stroe
Aynaz Taheri1

Responsible editor: 
Guest Editors EKAW 2014 Schlobach Janowicz

Submission type: 
Conference Style
Abstract: 
Using our multi-user model, a community of users provides feedback in a pay-as-you-go fashion to the ontology matching process by validating the mappings found by automatic methods, with the following advantages over having a single user: the effort required from each user is reduced, user errors are corrected, and consensus is reached. We propose strategies that dynamically determine the order in which the candidate mappings are presented to the users for validation. These strategies are based on mapping quality measures that we define. Further, we use a propagation method to leverage the validation of one mapping to other mappings. We use an extension of the AgreementMaker ontology matching system and the Ontology Alignment Evaluation Initiative (OAEI) Benchmarks track to evaluate our approach. Our results show how F-measure and robustness vary as a function of the number of user validations. We consider different user error and revalidation rates (the latter measures the number of times that the same mapping is validated). Our results highlight complex trade-offs and point to the benefits of dynamically adjusting the revalidation rate.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
[EKAW] combined track accept

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 21/Aug/2014
Suggestion:
[EKAW] conference only accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 1 weak accept

Reviewer's confidence
Select your choice from the options below and write its number below.

== 3 (medium)

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 3 fair

Novelty
Select your choice from the options below and write its number below.

== 2 poor

Technical quality
Select your choice from the options below and write its number below.

== 3 fair

Evaluation
Select your choice from the options below and write its number below.

== 3 fair

Clarity and presentation
Select your choice from the options below and write its number below.

== 3 fair

Review

The paper deals with the design of metrics and simulations to study user feedback models in ontology matching. The approach is based around several components: a meta-strategy to select mappings to be validated by the user, a strategy to aggregate feedback from different users and a third one for feeding user input back into similarity computations. The approach is driven by a series of metrics that aim to capture aspects of active learning matching that are expected to influence the three components mentioned earlier. Simulations discuss different error rates (which are constant across users), and revalidation rates. I think studying the role of user contributions in ontology matching, or any other data integration scenario, is useful and badly needed. Human intelligence is an integral part of the process and we need to understand better how it could be combined with algorithmic power. I also appreciate that simulations are useful as a first step towards understanding the relevant trade-offs. However, I miss a more critical discussion of the limitations of the methodology, and its implications in real world. In this real world assumptions such as constant user behavior (even for the same user, let alone multiple users) will definitely not hold. It would also be useful to discuss how such error rates could be reliably estimated. Authors mention that user engagement is outside the scope of their work; however, their analysis reveals that in the presence of errors, one needs a high revalidation rate. The way I understood this measure, this means more frequent alternations between human and machine computation rounds, which I would argue, are associated with delays, context switches, and several other user-related factors which will hamper the user experience and her motivation. With tasks as repetitive as validating matching results engagement might prove tricky to sustain. Further on, the best strategy seems to be to switch for lower revalidation rates to higher ones, where the motivation of the users in time is more likely to decrease (due to the less exciting nature of the tasks). Last, but not least, some would argue that a majority voting approach to something as subjective as finding correspondences between entities might be insufficient. Metrics to capture the diversity of views could be a useful addition to the system besides more realistic error rate models.
There are a number of aspects that I could not quite understand from the paper; they are all related to the operation of the overall system. How many users are expected to interact with the system at a given time? What kind of delays are expected to occur in the process until a metric like CON can be calculated and how will this affect the choice of a strategy for candidate selection (DIA vs REV)? In fact, it wasn't clear to me how the two meta-strategies impact the experiments, maybe this could be clarified in the paper. In the introduction the authors mention request by users. However, the rest of the paper seems to suggest that users interact with the system by validating mappings. What is the relationship between requests and validations?
References to the geospatial domain are somewhat confusing, as the rest of the paper is kept very generic.

Review #2
Anonymous submitted on 25/Aug/2014
Suggestion:
[EKAW] combined track accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

2

Reviewer's confidence
Select your choice from the options below and write its number below.

5

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

5

Novelty
Select your choice from the options below and write its number below.

3

Technical quality
Select your choice from the options below and write its number below.

4

Evaluation
Select your choice from the options below and write its number below.

4

Clarity and presentation
Select your choice from the options below and write its number below.

4

Review
Please provide your textual review here.

The paper presents new heuristics for guiding interactive ontology matching. The idea of the approach is derived from the concept of active learning where specific requests are generated for a human expert to help the learning process. In a similar way, the authors propose several measures that determine the assumed quality of a mapping hypotheses and select mappings for human inspection based on these measures.

The measures are based on the disagreement and other statistical properties of mapping hypotheses generated by different matchers and an approach for feeding the result of human feedback back into the mappings and the resulting measures are provided. The measures make a lot of sense, for a journal paper, however, I would have expected a deeper analysis of the contribution of the individual measures. Adapting the measures based on the users feedback is also a good idea. What I miss is the integration of definite feedback - in particular as the 1:1 assumption is made, a human feedback completely excludes some mappings from the search space (see also the work of Meilicke and others on interactive mapping revision*).

The evaluation is quite thorough and does not only cover a comparison of different parameter settings, but also a comparison with competing systems, in particular with Shi et al (reference [4]). As this systems seems to define the state of the art in this field, I would have expected a more detailed comparison with this approach and with active learning in general.

Overall the idea of the paper presents some good work. The paper starts a bit lame with too much talkshop. I consider section 2 to be completely redundant. This space could be used in a better way by providing more technical discussions and a deeper comparison with existing approaches. Overall, a good paper.

* Christian Meilicke, Heiner Stuckenschmidt and Andrei Tamilin. Supporting Manual Mapping Revision using Logical Reasoning. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence and the Twentieth Innovative Applications of Artificial Intelligence Conference : 13 - 17 July 2008, Chicago, Illiois, USA; 1213-1218. AAAI Press, Menlo Park, 2008.

Review #3
Anonymous submitted on 28/Aug/2014
Suggestion:
[EKAW] combined track accept
Review Comment:

Overall evaluation
Select your choice from the options below and write its number below.

== 3 strong accept
== 2 accept
== 1 weak accept
== 0 borderline paper
== -1 weak reject
== -2 reject
== -3 strong reject

1

Reviewer's confidence
Select your choice from the options below and write its number below.

== 5 (expert)
== 4 (high)
== 3 (medium)
== 2 (low)
== 1 (none)

4

Interest to the Knowledge Engineering and Knowledge Management Community
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

5

Novelty
Select your choice from the options below and write its number below.

== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor

4

Technical quality
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
4

Evaluation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 not present

3

Clarity and presentation
Select your choice from the options below and write its number below.
== 5 excellent
== 4 good
== 3 fair
== 2 poor
== 1 very poor
3

Review
Please provide your textual review here.

The paper proposes a novel feedback model for ontology matching where a community of users provides feedback on the mappings in an incremental fashion, i.e. the users are not requested to validate all the mappings generated by an alignment system, but only some of them, and the process can be stopped at any point in time.

The paper illustrates the method for multi-user validation that extends the AgreementMaker alignment system and presents a comprehensive evaluation on the OAEI benchmark datasets that shows the behaviour of the system's F-measure and error tolerance when different iterations of the algorithm are considered.

The approach presented in the paper is sound and well motivated. The scores concerning the clarity of presentation and the evaluation are motivated by some issues that could easily be addressed by an extended journal submission.
The paper is clearly written, and yet given the very nature of the proposed method, its presentation could try to help more the reader to remember the different formulae (AMA, CSQ, SSE, CON, PI, DIA, REV) and their characteristics. Some formulations are also somewhat counterintuitive, e.g. CSQ is defined as 1- (sum of all similarity scores \sigma_{i,j} in the same row and column / max sum of scores per dimension in the matrix), if then CSQ^- is used, i.e. 1- CSQ?

The notion of propagation is also somewhat not intuitive. The similarity of validated mappings (i.e. 0 or 1) is propagated to all the mappings that are most similar to the mapping just validated, where the similarity is represented by the signature vector of the mapping. Does this mean that the similarity of validation is propagated to those mappings for which different matching algorithm provide similar scores? Or is this just capturing the fact that there are mappings with similar level of confident assigned by the matchers?

The g used to calculate the propagation gain should be defined more precisely.

The experiments evaluate the approach by highlighting the behaviour in terms of f-measure and robustness (error tolerance) as the number of iterations grows. The evaluation randomly simulates the labels assigned by the users. It is not clear whether the experiments are repeated or run only once. Repetition would eliminate any bias intrinsic in the randomisation of the feedback simulation. it is also not clear whether the experiments take into account the distribution of the label (majority of 1s vs 0s and viceversa), and whether this is a factor that could affect performance.

The parameters of the 12 configurations chosen should be also motivated in more detail.

Regarding conclusion 2, it seems to presuppose that one can assess the error rate at every iteration run in order to decide whether revalidation is needed, but is this actually practical?

A final issue is whether the method assumes that the user is always willing to provide feedback, or whether there could be a point when the user looses concentration and does not provide any more feedback. Is this kind of trend possible for the user community, and would this lack of concentration affect the system?