|Review Comment: |
The paper introduces an approach for semi-automatic ontology matching or, to be more precise, the integration of user feedback into an interactive process to improve automatic mappings.
This is an interesting and challenging problem at the core of both knowledge management and the Semantic Web. The overall quality of the paper is high, an interesting theory is worked out in sufficient detail, well described and evaluated. So I would recommend to accept the paper with some minor revisions.
I think that there is good contribution beyond the state of the art, and a well written and structured document.
Those revisions are primarily some minor restructuring as well some more additional explanation of issues that I think are insufficiently worked out. I name those issues in no particular order.
In the abstract, the authors mention that the mappings with lowest quality are presented first to the users (something that comes back in step3). I have two problems with this, first, I think that in Active Learning the criteria for inclusion is not necessarily quality, but rather the one that helps improving the mapping, which I believe is not the same. In section 3.2, this is worked out far more detailed, of course, and the "quality measure" used is now disagreement etc. I suggest that the abstract and step3 are reformulated to avoid the impression that "quality" of the mappings (whatever that may be) is what is used for the ranking of the suggestions.
A second core ingredient that I find confusing is the notion of similarity: you state that the first step is to calculate a similarity matrix. But what do you mean by similarities? I would expect a similarity relation to be reflexive and transitive, which is surely not the case for your Matrix in Table 1. I could understand the use of the term similarity if the matrix was somehow symmetric and reflexive, but this is not even remotely the case.
This raises two questions: 1) whether it is a good idea to use the term similarity here, and 2) whether it is good to use those measures in the first place? The point is, I would guess, that the real results from mappers are producing values that are "not very good" (whatever that means). So, it might be that it is a valuable goal to produce a good similarity matrix from matrices as the one of Table 1. But then you have to be more clear of what those values are really supposed to mean. In the worse case the values returned by automatic tools are confidence values, rather than similarities. But I guess you are careful here (although this is not mentioned). More problematic, I find, is the fact that you use a single tool for creating your evaluation set: here a single notion of similarity is used, so results might be completely different in case different mapping tools provide "similarity values" with a different interpretation of the term similarity.
I do not want to suggest that the arguments and subsequent model are incorrect, I just think that the rather critical notion of similarity has to be discussed with more care.
A different point concerns the term " pay-as-you-go ", which has a prominent role in the title and abstract of the paper. Unfortunately, the term is not discussed wrt the method and the model. Although I understand the general idea why this term is used, I think this apparently essential contribution of the paper needs to be given a more prominent place, e.g. in section 3, but also in the evaluation. In section 3, it would be good to know how the design choices of the model are appropriate for an anytime approach. And though I understand that there is an evaluation about the robustness and quality over time I think it could be more clear what the results tell me w.r.t. the anytime. As an example, I think that one of a desirable property of an anytime approach has to be that it improves a lot within few iterations, and then maybe takes time to converge toward perfection. This kind of discussion would have deserved a more prominent place.
On a different note, I would appreciate more explanation on some of the design choices: to me a very interesting aspect of the approach is the potential to study various reconciliation methods, but then the only method is simple majority vote . This seems like a missed opportunity.
Another missed opportunity seems to me that the approach is not tested at all w.r.t. the real users, for example those from the geospatial domain. I know that user testing is painful, and expensive, and I think that the paper is publishable even without these kind of tests. But I think it would have been very useful to validate the assumptions underlying the experiments (as the ones you point out on page 9). It seems that in an ongoing research project, it does not seem impossibly hard to get concrete qualitative (even if small) data for such a test. It would be good at least to comment in the paper on the potential of such a test.
More structurally, I think that section 2 should be improved: it starts with a description of what you do not do, before you make clear what you actually do, or want to do. Instead you start with very specific problems (p3 first column), instead of introducing the problem you want to solve, and to give a global overview of how to solve the problem. The latter is there (in the 7 steps), but the first column of the section is rather confusing.
Finally, as a minor comment, I think that there is an overuse of Capitalization, and no consistent way doing this: you have CON and similarity score defined without capitals (p5), but Disagreement and Indefinitveness Average with capitals. Please try to be consistent here.
Small things I stumbled across:
- p9, Our focus is on the evaluation of the methods minimize -> non-grammatical sentence
So, to summarise: a nice paper, which solves a challenging problem in an interesting way, and provides some useful insights empirically, should in my view be accepted for publication, once the authors have discusses some critical notions, such as quality of mapping, similarity, etc a bit more systematically.