A Session-based Ontology Alignment Approach enabling User Involvement

Tracking #: 1082-2294

Patrick Lambrix
Rajaram Kaliyaperumal

Responsible editor: 
Guest Editors Ontology and Linked Data Matching

Submission type: 
Full Paper
One of the current challenges in ontology alignment is the user involvement in the alignment process. To obtain high-quality alignments user involvement is needed for validation of matching results as well as in the mapping generation process. Further, there is a need for supporting the user in tasks such as matcher selection, combination and tuning. In this paper we introduce a conceptual ontology alignment framework that enables user involvement in a natural way. This is achieved by introducing different kinds of interruptible sessions. The framework allows partial computations for generating mapping suggestions, partial validations of mapping suggestions, recommendations for alignment strategies as well as the use of validation decisions in the (re-)computation of mapping suggestions and the recommendations. Further, we show the feasibility of the approach by implementing a session-based version of an existing system. We also show through experiments the advantages of our approach for ontology alignment as well as for evaluation of ontology alignment strategies.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Daniel Faria submitted on 30/Jun/2015
Review Comment:

The revision made by the authors addresses essentially all points with which I was concerned in the previous version of the manuscript. The revised paper is easier to read, and the main points come accross more clearly - moving most of the experimental results to an appendix was a good idea!

I believe the paper is ready for publication, although I have a few final minor comments which I hope the authors take into consideration (but which in my view do not constitute grounds for even a minor revision):

- Although I approve of moving most of the experimental results to the appendix, I would have preferred a table with the key results in the main manuscript. Still, I realize this is a matter of opinion, and section 5.5 already does a good job of summarizing the main conclusions drawn from the experiments.

- The Anatomy track ontologies haven't changed since before 2011 until the present day (at least OAEI 2014). While providing a date is important for future reference, in this case 2011 may give the impression that the dataset is outdated, when that is not the case. Thus I would suggest that the authors simply refer to the dataset with the date of 2014 instead of 2011, as this is the most recent OAEI edition as of the submission of the paper.

- The definition of 'segments' in section 2.2.2 would be clearer if the authors used "subset" instead of "set".

Review #2
By Ernesto Jimenez-Ruiz submitted on 15/Jul/2015
Major Revision
Review Comment:

This papers represents a revised version of a previous submission for which a "major revision" was required. I appreciate the response provide by the authors.

The paper has also been re-structured as suggested by the reviewers and most of the evaluation moved to the appendix (perhaps some of the evaluation should have been kept in its correspondent section). Some of my comments have been addressed or authors have provided a satisfactory response. Some of them, however, still apply:

- Regarding the workflow of the framework, it could be easier to understand if a (simple) running example is provide.

- The authors has changed the focus of the paper. However it would still be interesting to evaluate the proposed framework with the OAEI LargeBio ontologies (at east some of them). The alignments for them are available online. See:

- It would also be very interesting to test if a recommendation based on a matching task (e.g. the OAEI anatomy track) could be applied to another matching task (e.g. the OAEI's LargeBio track for which reference alignment are available).

Some additional comments:
- Section 1. PA is mentioned without being introduced.
- Section 2.2.1: footnote 5. The definition should be generalised usfrom the very beginning using "M" and the make explicit the use of PA or mapping suggestions as instances of M.
- Section 2.2.1: How easy/efficient is to get consistent groups when having thousands of equivalent mappings?
- Section 3: an oracle is mentioned but not more details are given
- Section 4.2.3. Are not the best matchers automatically selected/suggested if previous sessions available?
- Section 6. LogMap has a web user interface (http://csu6325.cs.ox.ac.uk, also accessible from https://code.google.com/p/logmap-matcher/). It was created right after publishing the ECAI paper, but it is mentioned in the OAEI papers 2013 (http://disi.unitn.it/~p2p/OM-2013/oaei13_paper5.pdf) and 2014 (http://disi.unitn.it/~p2p/OM-2014/oaei14_paper4.pdf).
- I hope to see SAMBO extension with sessions to be evaluated in the OAEI 2015 Interactive track.

I believe this paper has potential since the topic it deals is very important and challenging. My decision still reminds "major revision" but it is getting closer towards acceptance.

Review #3
By Michelle Cheatham submitted on 18/Sep/2015
Minor Revision
Review Comment:

The paper describes a "session-based" ontology alignment system that allows a user to provide feedback on a suggested partial mapping. This feedback is used immediately to improve the configuation of the matching algorithm (weights, thresholds, etc.).

The approach is orthogonal to the particular similarity metrics used in the matcher (and can be used with any such matcher), and is therefore of interest to many researchers in the field. The approach to recommending appropriate parameter values (e.g. weights and thresholds) is an under-researched area that is also of considerable interest.

The paper makes a point of stressing that caching any similarity computations improves computational performance, but I think this is actually fairly well-known and somewhat obvious. The code for several alignment systems that I've seen does this. I suspect that it is not mentioned more because it might be seen as "cheating" in the OAEI since runtime is reported there.

The experiments are well-conceived and explore the significant aspects of the approach. Experiments are conducted with only a single ontology pair, which somewhat limits the conclusions that can be drawn, however.

Some minor suggestions:

It would be useful to explain how the current paper expands upon [23].

The abbreviation PA is used on page 2 before it is defined on page 3.

On page 8 when the NaiveBayes classifier is discussed, it is not clear how the corpus of documents for each ontology is created. I realize that this is ancillary to this paper, but a sentence of explanation would be helpful.

The description of Sim2 on page 10 is not entirely clear.

On page 20, it is not clear what is meant by "due to the consistent group in the double threshold filtering."