A Session-based Ontology Alignment Approach for Aligning Large Ontologies

Tracking #: 825-2035

Authors: 
Patrick Lambrix
Rajaram Kaliyaperumal

Responsible editor: 
Guest Editors Ontology and Linked Data Matching

Submission type: 
Full Paper
Abstract: 
There are a number of challenges that need to be addressed when aligning large ontologies. Previous work has pointed out scalability and efficiency of matching techniques, matching with background knowledge, support for matcher selection, combination and tuning, and user involvement as major requirements. In this paper we address these challenges. Our first contribution is an ontology alignment framework that enables solutions to each of the challenges. This is achieved by introducing different kinds of interruptable sessions. The framework allows partial computations for generating mapping suggestions, partial validations of mapping suggestions and use of validation decisions in the (re-)computation of mapping suggestions and the recommendation of alignment strategies to use. Further, we describe an implemented system providing solutions to each of the challenges and show through experiments the advantages of our approach.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 12/Nov/2014
Suggestion:
Major Revision
Review Comment:

This manuscript was submitted as 'full paper' and should be reviewed along the usual dimensions for research contributions which include (1) originality, (2) significance of the results, and (3) quality of writing.

This paper was aimed at dealing with the problem of aligning large ontologies. This is definitely an interesting and challenging issue because of the high heterogeneity of the available ontologies and given the growing size of these ontologies. The paper is aimed to address the following challenges : (i) scalability, (ii) matcher selection, (iii) combination and tuning (iv) background knowledge and (v) user involvement. The authors claims that these aims are achieved by using different kind of interruptable sessions by means of their framework. The proposed approach is semi-automatic and requires in many places a certain expertise level in ontology alignment field.
The paper is easy to follow.

Let us see what are the contributions for the main challenges cited above.
Scalability. The only technique, which has been implemented is the partitioning technique. This is not a novel idea since this technique has been used in many ontology matching systems. Furthermore, it seems that the authors argue that interrupting sessions is contributing to the scalability. Maybe; however this certainly does not improve the efficiency in term of time performance.

Matcher Selection. The selection is done manually by the user by checking the check boxes.

Combination. The authors have used weighted-sum and maximum-based approaches. Here again, there is no novelty. Moreover the user is asked to choose one of them. Therefore, it is assumed that the user is an expert of ontology matching. Furthermore, they do not say how et who is in charge of choosing the appropriate weights. This is not an obvious issue.

Background knowledge. The authors did not provide enough details about the use of BK. It is only mentioned the usage of WordNet-based and UMLS-based algorithms. We conclude there is no real originality in this part.

User involvement. This part is quite original since it allows the user to work in interruptable sessions. He/she starts a computation session and saves the results for a later on reuse. The approach offers also the possible of precomputed similarity values and mappings to be reused.

Finally, the architecture of their framework is not so original. Many OM tools are reusing mappings and background knowledge. The reader is not convinced that the proposed approach is really efficient in aligning large ontologies because interrupting sessions is not contributing to reduce search space. Moreover, the authors missed to demonstrate how their techniques can achieve that.
Moreover, the description level is not appropriate. Indeed, the authors do not present deeply their technical contribution. For example, Fig.1 is not very meaningful and Fig. 7 is unreadable.

The evaluation part is hard to follow. There are too much experiments. However, the authors do not provide any comparison with other related work. While, there are several tools dealing with large ontologies matching that participate every year to large scale tracks of OAEI campaign.
Moreover, precision measure is not sufficient to evaluate the matching quality. What about the post-match effort: (i.e., time spent by the expert to validate/invalidate the discovered matches)? It seems that this time is very significant in this approach since the user is asked to make decision in many places.

Conclusion.
Based on the comments above, I recommend a "major revision". There is no real and clear technical contribution except the idea of interruptable sessions. There are several issues with the paper, which should be addressed.
- Primarily, this concerns the presentation part. The authors missed to define formally some notions that have been used in the paper.
- The related work section is very weak. Some related works are not up-to-date. For example, instead for referencing COMA++, they should reference GOMMA. The authors should visit the recent results of OAEI campaign and
- they should clearly state what is the added value of this paper regarding the others dealing with large ontology alignment. They should perform an evaluation of their approach and compare their results to those of the more close related work (e.g., see OAEI2013 results).
- They should present more deeply the technical contribution of their work. Finally, I would suggest to change the title of their paper since their main contributions are related to user involvement and not to large ontology alignment.

Review #2
By Daniel Faria submitted on 14/Nov/2014
Suggestion:
Major Revision
Review Comment:

The paper describes a session-based approach to ontology matching, its implementation in the SAMBO system, and its evaluation on the Anatomy track from the OAEI.

Regarding originality, using interruptible sessions as a main paradigm in ontology matching is a novel concept, of particular interest for tasks involving user interaction and large ontologies. Also novel and promising is the use of validated mappings in computing new mappings and recommending alignment strategies. That said, much work on user interaction has been done on the AgreementMaker system [1], which is not referenced in the manuscript. AgreementMaker can even support sessions to some extent, through saving, loading, and combining alignments from different matching algorithms before or after user feedback. Please note that while related, AgreementMaker and AgreementMakerLight (which the authors do cite) are independent systems, with the former having the stronger focus on user feedback.

The evaluation of the proposed framework was extensive with regard to the settings tested, but alas a bit exhaustive. It took me several reads to make sense of all the results and tables, having to go back and forth between tables to compare the various settings. Moreover, the results did not fully convince me of the take-home message of section 5.5.1 regarding the recommendation of alignment strategies. Do the "actual Fc" values in the "recommended alignment strategy" tables account for the validated mappings or not? In other words, are they relative to the whole alignment or only to the non-validated mappings? If it is the former, then there are several cases where the alignment quality decreases with sessions, which raises questions about the recommendation strategy. If it is the latter, then is the quality of the whole alignment (including the validated mappings) always increasing with the sessions? If so, this should be clarified in the paper - ideally the performance of the strategy should be compared with that of the optimal strategy also excluding the validated mappings. Overall, I feel that the evaluation section could be abbreviated a bit, and the main findings should be summarized quantitatively so as to better support the conclusions of the paper.

My other concern about the evaluation is that it is based only on the Anatomy track dataset, which is not exactly a large matching problem. Given the title of the paper and the focus of the introduction on the subject of large ontologies, I would expect the proposed approach to be at least tested on truly large matching problems (such as those in the Large Biomedical Ontologies track of the OAEI) in order to assess its scalability. Otherwise, the reference to "large ontologies" in the title should be removed, and at least a theoretical assessment of the scalability of the approach should be made in the manuscript.

The manuscript is clearly written overall, but a final revision is needed to catch a few spelling /phrasing issues. A recurring one is that you misspell "interruptible" as "interruptable".

Another minor issue is the use of the word "term" in the context of ontologies. While popular with the Open Biomedical Ontologies community, "term" is not used in other ontology domains and is imprecise in meaning. It should be replaced with "class" / "concept" or, if you want to encompass properties and instances, "entity".

[1] I. F. Cruz, C. Stroe, and M. Palmonari. Interactive user feedback in ontology matching using signature vectors. 2014 IEEE 30th International Conference on Data Engineering.

Review #3
By Ernesto Jimenez-Ruiz submitted on 07/Jan/2015
Suggestion:
Major Revision
Review Comment:

This paper represents an extension of a paper published at the Extended Semantic Web Conference (ESWC) 2013. I also performed a review of a (preliminary) version of the framework in 2012.

The main technical contributions and the bases of the framework seem to have been already published in previous papers from the authors. The paper contains, however, an extended evaluation with respect to the ESWC 2013 paper.

The topic of the proposed paper is very interesting as there is an increasing demand for tool support to involve the domain expert within the matching process as well as for a more iterative matching process. However, I have the following concerns about the current status of the paper (I believe some of the review comments I wrote back in 2012 are still valid):

- The paper should include an extended preliminaries or background section to introduce the techniques or definitions used along the paper (e.g. segments of an ontology).

- The workflow of the framework is a bit confusing (Figure 2) and in my opinion it would be better presented in the paper. Perhaps, providing concrete scenarios as example would help, e.g.: where the three types of sessions complement each other or only two of them take place.

- The framework, as the paper title states, is intended to deal with large ontologies. However, the experiments only involved the medium size ontologies from the OAEI's anatomy track. Larger ontologies like SNOMED or FMA from the OAEI's LaregBio track may have an important impact in the computation phase (e.g. billion of candidates/suggestions) and in the validation phase (a very large number of questions to validate by the user). A session based approach will definitely help matching large ontologies, but the framework should also consider performing as few questions as possible to the user.

- The proposed approach tries to reduce the search space by performing a partition of the ontology. However, I have the feeling that the search space may be reduced too much with the current used method, specially when ontologies are structurally poor or with disparate classifications.

- In Section 4.3 authors state: "After validation a reasoner is used to detect conflicts in the decisions". Which kind of conflicts are detected? Is a complete reasoner used? Complete reasoning may be time consuming or infeasible for rich and large ontologies. Fortunately, there are (approximate) mapping repair techniques that can do the work (e.g. Alcomo [1], AML [2] or LogMap [3]).

- The evaluation is very comprehensive with respect to the recommendation and computation sessions, perhaps too comprehensive and it is not easy to follow and interpret all the results provided in the tables. I suggest to focus (and comment) on the more important results and add an Appendix with the rest of the tables/results.

- On the other hand, the paper lacks an evaluation about how many questions/feedback the user is required to provide and their impact (how many mappings are affected/validated). The paper also lacks information about how questions are presented to the user. Are questions ordered with respect to a given heuristic or just with respect to availability? I think the use of sessions are very interesting to involve the user, however the paper does not state how important is indeed the potential user involvement.

- It would also be very interesting to test if a recommendation based on a matching task (e.g. the OAEI anatomy track) could be applied to another matching task (e.g. the OAEI's LargeBio track).

- Framework VS System. The presented approach seems to fall in between a general framework and a fully-fledge system. On the one hand if the authors aim at providing a general framework it would be very interesting if state of the art matching algorithms/systems could be plugged-in (e.g. OAEI ones). On the other hand, this session-based SAMBO could also be seen as a ready to use system that could participate in the OAEI campaign, specially in the Interactive track where there is a validation routine or Oracle.

- I miss a link to the implemented system. I could only find the following link which is not available: http://www.ida.liu.se/~iislab/projects/SAMBO/online.html

[1] http://web.informatik.uni-mannheim.de/alcomo/
[2] http://somer.fc.ul.pt/aml.php
[3] https://code.google.com/p/logmap-matcher/