Using Background Knowledge to Enhance Ontology Matching: a Survey

Tracking #: 1662-2874

Authors: 
Amina ANNANE
Zohra Bellahsene
Faiçal AZOUAOU
Clement Jonquet

Responsible editor: 
Jérôme Euzenat

Submission type: 
Survey Article
Abstract: 
The ontology matching research community has been very active for a decade. Recently proposed state-of-the-art methods promote the use of a set of external knowledge resources as Background Knowledge (BK) for enhancing the ontology matching quality. Several important questions related to the use of background knowledge resources arise (i) In which case is the use of BK resources justified and necessary? and (ii) What is the tradeoff between the complexity of the alignment methods and the use of BK resources in terms of the quality of matching and time execution? Another interesting issue is the selection of the useful BK resources for a given ontology matching task. In this paper, we try to answer the questions by reviewing the different methods dealing with the two main steps of BK-based ontology matching that are BK resources selection and BK resources use. In addition, we provide a synthetic classification of BK resources use methods. Finally, we present a comparative evaluation of BK-based ontology matching systems by analyzing their performance results obtained during Ontology Alignment Evaluation Initiative (OAEI) 2012-2016 campaigns. We thus evaluate the benefit of using BK resources and the improvement achieved by this approach with regard to the systems that do not use BK resources.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Reject (Two Strikes)

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Angela Locoro submitted on 30/Jun/2017
Suggestion:
Minor Revision
Review Comment:

In this revision, the paper has changed a lot, in order to implement all the reviewers' comments. I still think that this survey is relevant in its topic, scope and purpose for the Semantic Web Community, and the research still touches all the necessary points that a survey on the matter should cover.

However, it is more evident from the introduction of fomal definitions in the introductory part that the rest of the paper could have followed the same fate and could have undergone in many other points to the same (uni)-formal maquillage, especially where it tooks many rows to explain an approach in a descriptive way. Formalising, abbreviating and uniforming terms and kind of operations on ontologies concepts could probably make the paper clearer and improve a lot the flow of reading, as well as reduce the English typos. For example, the experimental part is much more systematized (also with the help of many charts and tables) than the middle part of the paper, i.e., sections 4 and 5. An important part of this systematization would be for example to avoid the italic style for long and over-repeated terms like "BK resources domain", "useful BK resources" and so on, which should instead been abbreviated at the beginning as key terms (introduced well in advance) and explained in a once-and-for-all legenda. Please uniform the terms usage throughout the paper (for example sometimes "useful BK" is not italic...) or clarify the differences.

As there are still many English typos and narrative imprecisions on some approaches and tools (probably due also to the of necessity of improving the English of this paper), it is advisable to adjust them. In case this paper becomes a reference point on the matter, it is strict to clarify and make as easy as possible its content, in order to let the BK-based ontology matching understandable to the research community. Another hopefully useful suggestion is to answer to the reviewers' comments in the reply letter with more details about the kind of content that changed due to the reviewers, and minimise succint answers like "done", especially where you have revised deeply a piece of content. The ideal would be to report verbatim the paper content inside the reply, to prove that you have changed the paper in reply to a specific reviewer request. To be more specific, I would suggest the following changes to the actual version:

In general: please check and delete the white spaces between a word and the subsequent punctuation (semicolon, dot, and so on), and delete white spaces wherever there are more than one between a punctuation mark and the subsequent word. Avoid putting parentheses inside other parentheses in the text and in the figures/tables caption. Please correct it.

Abstract:
I would replace the expression "BK resources use" and "BK resources use methods" with a plainer "use of BK resources" and "methods using BK resources". Please also consider to replace the term "use" with the term "exploitation" as, in this case, it seems more appropriate (being more specific and connotated toward "using something to obtain some benefits").

Keywords: please uniform all the keywords terms by putting the first letter uppercase or to let only the first term letter uppercase. There is no apparent reason to adopt different styles, like "Background knowledge resources Selection" and "BK-based ontology matching" (why is ontology matching lower-case and Selection upper-case?)

Introduction:
Semantic Web is a proper term and its first letters go uppecase
Correct the sentence : "research including in application domains such biomedicine" ---> research including application domains such as biomedicine
Do not abuse the "-ing" verbal form, but wherever possible, use the present form instead, for example: "data residing in several sources" --> data that reside in several sources
Correct the sentence: "to a point to which specific search engines are required" ---> to the point that specific search engines are required to retrieve them (or to make them available)

Page 2:
Correct the sentence: The diversity of their heterogeneity ---> heterogeneity already means diversity! I would also compact this sentence by talking about "syntactic, terminological (or lexical) and structural heterogeneity (terms in this order). Please briefly explain and introduce in the paper suitable definitions of what you intend for syntactic, lexical and structural heterogeneity.
What is the difference between "appropriate BK resources" and "useful BK resources"? Please clarify
Correct the sentences:
- An evaluation and comparison of the BK-based ontology matching systems results ---> An evaluation and comparison of the results of BK-based ontology matching systems
- Section 5 presents a review of BK resources use ontology matching methods ---> Section 5 presents a review of the exploitation methods of BK resources for ontology matching
- we conclude with some elements of response of the questions ---> we conclude with some considerations in response to

Correct the following sentences:
Section 2:
- related notions ---> the relate notions

Section 2.2:
- dictionnary ---> dictionary.
- We define it as any set of one or multiple external knowledge sources ---> We define it as any non-empty set of external knowledge sources.
- we refer to background knowledge in ontology matching context with BK resources ---> we refer... context with the term BK resources.

Section 2.3:
- Let A an alignment ---> Let A be an alignment
- are computed as following ---> are computed as follows

Section 3:
Please introduce and define first of all the concept of BK resources domain, BK resources selection and BK resources use (or make explicit that definitions will be given in subsequent sections)

Correct the sentence:
- To the best of our knowledge the question ---> To the best of our knowledge, the question

There is an imprecision related to the work of reference [5]. The workflow represented in their figure is more abstract with respect to the one presented in this paper. But it is not true that in [5] "how to do it?" (e.g., anchoring, deriving mappings - btw, what do you mean with "it"? to do what?) is not explicitly stated, and it is not precise to say that the selection has not been considered as an independent task (in the paper also the selection step has been parameterized and tested).

Section 3.1:
Please expand the definition of BK resources domain at the beginning by using the formal notation (e.g., put definitions in a separate paragraph as you did for the definition of Useful BK resources), please avoid to put in parenthesis pieces of formulas as they were in the stream of the discourse, but use the formal formatting for them (e.g., explicitate all the passages)

Correct the sentences:
- The selection of Bk resources to use for a given ontology matching ---> the selection of the BK resources to be used (or that will be used)
- the aim of BK resources selection step ---> the aim of the BK resources selection step

Please put references at the bottom of the sentence, and correct them: "Indeed, related work demonstrate(s)" and "Furthermore, it may happen that using several ontologies ... may provide the same result as one resource" ---> as using one resource

Please uniform the verbal tense: for example in the sentence "we selected only the mappings that are relevant..." ---> were relevant

Correct the sentences:
- Another example can be found in [10], the author" ---> where the author

Section 3.3.1:
- Anchoring as identified by [55, 38] ---> i would also put [5] as anchoring is also described in it with the same meaning
- Let M a matcher ---> Let M be a matchier
- In principle, any matcher may... In practise, the matcher used..." ---> please provide references at the end of this sentence

- Indeed, ontology entities that are not anchored on the useful BK resources ---> to the useful
- are not explicitly connected ---> delete the comma between the subject and its predicate.
- Consequently the useful BK resources will not be effective ---> Consequently, the useful...
- The two first mappings ---> The first two mappings
- (es, et, subClass) represents ---> the expression (es, et, subClass) represents

Please avoid to begin sentences with a subordinate sentence (attach it to the previous one). For example, I found this two in the paper: "While, the selection is the strategy tha tallows " or "In particular, by using..." or "For instance, selecting the useful fragments" or "For instance, the use of datasets or ontologies..."

Section 4.1.
Please provide a reference and describe "strict string matching" metric (is it "equivalent string matching"?).

Correct the sentences:
- .. . appropriate web page using... ---> appropriate web page by using

Section 4.2
- allow inferring more relationships ---> allow to infer more relationships
- thanks of a given ---> thanks to a given
- allows identifying candidates ---> allows to identify
- less than a defined threshold ---> below a defined threshold

Please clarify the following passage as, put like it is now, the consequence does not follow from the premise:
"While all previous described selection methods select a limited number of ontologies, the use of mapping as a unit of selection allows to have concepts of different ontologies in the repository (more than 500 ontologies) without limitation in the number of selected ontologies. Consequently, the resulted useful BK resource (a graph) has a small size (number of concepts) compared to the size of the initial whole ontologies."
Section 5:
Please be aware that at page 9 and 13 the first column of the text a double line spacing, whereas the rest of the paper has a single line spacing.

Correct the sentences:
Page 10:
- from the language of souce ontology to the language of target ---> from the language of the souce ontology to the language of the target
- tagged by isClose relationship ---> tagged by the isClose relationship

Section 5.1.2
- are tested whether ---> are tested to verify whether
- Bk resouces used which represents ---> Bk resouces used, which represents

Page 11:
- appears in the type of produced mappings ---> of mappings produced.

Page 12:
- in sake of fair comparison ---> for the sake of

Page 15:
Please write a couple of introductory rows before the sub-paragraphs starting with years.

Correct the sentences:
Pag: 17
- LogMapBio spent more much time ---> much more time
- to complete the task comparing to ---> when compared to

I would title Section 7 "Discussion" rather than "Agenda for Open Search" as setting an agenda would be a little bit more visionary than simply discussing results and putting for each consideration a final lessons learnt sentence.

Correct the sentences:
Page 20:
- statements answering these questions according to our study ---> statements that answer these questions, according to our study

Pag 21:
- We tried to study if the use ---> We tried to study whether the use
- advanced similarity measures by the use of the useful BK ---> with the use of the useful BK

Please cite reference [5] correctly in the References Section: the name of the journal is "Journal on Data Semantics" and not "Data Semantics"

Review #2
By Marta Sabou submitted on 20/Jul/2017
Suggestion:
Accept
Review Comment:

I thank the authors for the thorough response to my comments and for taking most of them into consideration. I especially liked the improved formalization of the matching task as well as Section 7 on future research challenges.

At the same time, I still think that Aggregation and Selection methods are not sufficiently well discussed and ask authors to expand on this topic for the final version of the paper.

Small typo:
“Manuel” - in Table 1, for resources [54].

Review #3
By Daniel Faria submitted on 26/Jul/2017
Suggestion:
Major Revision
Review Comment:

The survey provides an overview of the use of background knowledge in ontology matching.
While some aspects of the manuscript improved with the authors' revision, alas many of the additions made by the authors have introduced new issues, and I feel that as a whole, the manuscript hasn't really improved.

With respect to the relevance of the topic, I reiterate what I said in my previous revision: I believe a survey of the topic is merited and would be useful to the Ontology Matching community specifically, and to a lesser degree, to the Semantic Web community.

The manuscript is fairly comprehensive, but regarding balance, I fear that the authors have focused too much on the OAEI, to the detriment of real ontology matching scenarios.

There are also issues in terms of suitability of the manuscript as an introductory text on the topic: there are still substantial problems with its definitions; and in the later sections of the document, the arguments made by the authors are seldom well substantiated or explained, and in some cases are of dubious soundness.

Readability and clarity are the aspects where the manuscript fares worst. Lack of clarity is an issue in the later sections of manuscript, as mentioned above, whereas readability is hindered throughout the manuscript by the profusion of grammar and word-usage errors. In fact, the grammar of the paper has gotten worst since the previous submission. I appreciate the magnitude of the changes made to the document, but this doesn't justify the poorer quality of writing of many of the additions. I will provide a list of recurring errors made by the authors, but this should be no substitute for a careful revision of the manuscript by a fluent (preferably native) English speaker, which is paramount.

Content issues:
- BK Resource(s) Domain:
Grammar and syntax aside, this label for the concept described in section 3.1 is problematic in that the usage of the word "domain" differs from the established usage in ontologies, and thus could be confusing to readers. The set of BK resources described by the authors in section 3.1 does not necessarily comprise only resources of the same domain as the ontologies being matched, nor does it necessarily cover all resources in that domain. I would suggest that the authors adopt a less ambiguous and confusing label for this concept - perhaps "BK Resource Pool".

- Useful BK Resources & BK Resource Selection
While the authors followed up on some of my previous critiques regarding their definitions, namely by incorporating the multiple BK resource scenario, alas they ignored my criticism of their defining usefulness as a function of F-measure. I'll reiterate that we cannot know F(A') a priori and seldom know it even a posteriori in most practical cases (i.e., outside of the OAEI), so defining BK resource selection as a function of it is inaccurate, unrealistic, and potentially misleading to inexperienced readers. This is evidenced by the fact that the usage of "useful BK resource" made subsequently throughout the manuscript is not consistent with definition 1, as indeed it could not be.
Even if we could know the effect of a BK resource on F-measure a priori, in a real-world matching problem where the alignment is to be manually validated, researchers may be more interested in recall than in precision. Likewise, there could also be cases where the opposite is true (e.g., because logical coherence is critical). The authors should keep in mind that there is ontology matching outside of the OAEI, and that the definitions they provide should be broad, encompassing, and realistic.
The only realistic and useful definition of "useful BK resource" that I can think of is: a BK resource that contains knowledge beyond that contained in the ontologies to match but which is relevant to match them. BK resource selection is the process of finding one or more such useful BK resources from among a BK resource pool. Ideally, in a multiple BK resource scenario we should ensure that each BK resource is useful not only with respect to the input ontologies, but also with respect to other BK resources. However, that is not a necessary criterion.
Do note that the reason why we perform BK resource selection and use BK is indeed to attempt to improve the alignment between the input ontologies, usually (but not always) with respect to completeness, and ideally without sacrificing correctness. This does translate to increasing F-measure, but the reason why we carry out a process is not a description of the process itself.

- Mapping relations:
The mapping relations as listed in the legend of figure 4 and in two other places in the text correspond to "subsumes" (rather than "more general"), "subsumed by" (rather than "less general") and "equivalence" (as listed). Please correct accordingly!

- Evolution figures:
I feel that Figures 6 and 9 should show the best result obtained thus far in all previous OAEI editions rather than the results of the best system in each category on each year. If the goal is to show the evolution of the state of the art, then the latter is much more interesting, and the former is already listed in the bar charts anyway. For assessing the evolution of the state of the art, it is irrelevant that YAM++ no longer participated in the OAEI after 2013 - if no subsequent participant matched its performance in some tasks, then it should be considered the state of the art.

- Large biomedical ontologies:
For the purpose of the LargeBio evolution, you should exclude results from XMAP-BK as the system violates OAEI guidelines in adopting UMLS as a BK resource. They show up in the OAEI tables either in a different color or with an asterisk precisely to indicate that they are not comparable with other matching systems.
In their comments regarding LogMap, the authors say that "This may be explained by the fact that LogMap used also UMLS Lexicon". Keep in mind that the UMLS Specialist Lexicon (which LogMap did use) is very different from the UMLS Metathesaurus (which XMAP uses) and that it is only the latter that is unfair to use due to the LargeBio reference alignments being derived from it.
Keep in mind that the LargeBio evaluation and reference alignments changed between 2013 and 2014 (but have remained constant between 2014 and 2016). For this reason, 2012 and 2013 results are not directly comparable with subsequent results, and you cannot draw any inference with respect to the evolution between these two "eras". Thus, for your evolution figures, you should either consider only the results from 2014 onwards, or re-compute the evaluation for the best performing system in 2012-2013 using the new reference alignment (which I'm sure Ernesto Jimenez-Ruiz would be willing to do).
Finally, when the authors discuss the evolution of the results over the years, they are confusing the performance of BK-utilizing tools with the performance of the BK matching strategies themselves. Speaking concretely for AML, which has been the top BK-utilizing tool in LargeBio's whole ontologies tasks for a number of years, I can inform the authors that its BK matching strategy has not changed in the slightest since 2014. Its performance has improved because we've since developed other non-BK matching strategies. Thus, the authors cannot infer any evolution with respect to BK matching strategies from the OAEI results.

- 7. Agenda for open research
The addition of this section was, in terms of organization and comprehensiveness, a great idea. Unfortunately, the execution will require substantially more work, as it is by far the worst portion of the manuscript in terms of quality of writing and substantiation of arguments. I'll delve into each subsection individually.

- 7.1. Automatic identification of BK resources domain
The authors state that "Novel methods should be designed and implemented to define automatically a BK resources domain for a given ontology matching problem" but do not provide a single reason as to why this should be the case.
It is not at all clear to me that it should. The reason why there is a need for automated ontology matching is because it is an extremely time consuming task to do manually. However, users of automated ontology matching tools can generally be expected to know what is the domain of the ontologies they want to match, and thus to either preselect potential BK resources or at least to choose a category/domain that the matching tool should use. I see no need for this preselection to be fully automated, and I feel that the authors were more focused on the OAEI than on real ontology matching scenarios in putting forth this agenda item. If the authors have any compelling arguments to the contrary, then they should detail them in the manuscript.
I will say that the authors' idea of indexing existing ontologies and BK resources by domain is interesting as a means to enable users to preselect them.
Last but not least, LogMapBio's usage of BioPortal is in essence a strategy that enables the automatic identification of the domain of the BK resources. While it doesn't strictly qualify as identification of the "BK resources domain" (as according to the definition of the authors, BioPortal would be the "resources domain" for LogMapBio) one could argue that BioPortal is akin to the whole universe of ontologies. Thus this is an example that merits discussion in this context.

- 7.2. More effective automatic BK resources selection
I'm afraid I don't understand or agree with any of the points made in this section, starting with the word "effective" in the title when most of the section is seemingly about efficiency. Like the previous section, this one lacks clear explanations, and may mislead inexperience readers.
1) The base argument that "only small fragments from the selected resources are really useful" is not generically true. When you use a BK ontology of the same domain as the input ontologies, you don't expect it to be true (e.g., UBERON in the Anatomy track). When you use a small BK ontology that covers part of the larger input ontologies, you also don't expect it to be true (e.g., DOID in SNOMED-NCI task).
2) Even in the cases where it is true, it is unclear that it "decreases the efficiency of the matching process" by comparison with "the the ability to select only the useful fragments from each element". I fail to see how you could select only useful fragments without having to analyze whole resources and to do something akin to anchoring. Even if you could, why would that fall into the category of BK selection rather than BK usage? It seems to me that BK selection ends with the identification of the useful BK resources, be they wholly or only partially useful, and that everything after than would be BK usage. Finally, in performing anchoring, matching tools already inherently select useful fragments of the BK resources, and are able to do so very efficiently (e.g., AML can do so in linear time) - and if "proper" fragments were desired, blocking strategies would be an option.
3) The example of the DBpedia is one of the few compelling cases I can think of for preselection of useful fragments that shouldn't fall under BK usage, as preselection could be done without extensive analysis by filtering by subject/domain. However, it is an exception rather than the rule, and therefore the authors shouldn't base a whole agenda item on it. If the authors have more compelling arguments for the need to select fragments rather than whole BK resources, then they should detail such arguments in the manuscript in a manner that is clear.
4) Finally, I don't see why "to combine the fragments extracted from different resources into a single BK resource" would be another challenge. I don't really see why you would want to do so in the first place - what are the advantages of doing so over simply using the fragments in parallel? But even if there were advantages, why would this combination need to happen during BK selection rather than BK usage? It seems to me that this would fall under the "combine then derive" strategy of BK usage that the authors describe in the manuscript. As in the previous point, if the authors have compelling arguments as to why combining fragments is relevant and should happen during BK selection, they should detail such arguments in the manuscript.

- 7.3. A tradeoff between the matching quality and the run time
This section is less contentious than the previous ones, but it leaves much to be desired with regard to clarity and quality of writing as well as to substance. The title would be clearer as "The tradeoff between effectiveness and efficiency", as that is what is being discussed in the section. In essence, the authors mean to state that BK-based methods should aim to improve efficiency without sacrificing effectiveness. However, they do not substantiate this assertion (are current methods sub-optimal in computational complexity?) or give any suggestion as to how this improvement could be achieved. Given that the same assertion could be made about every single process, computational or otherwise, the authors must substantiate it if they wish it to have any meaning.

- 7.4. Final mapping selection
I wasn't expecting the authors to adopt the argument I made about the FMA-NCI task in my previous review almost verbatim. That argument was just to point out to the authors that the LargeBio test cases do not have proper gold standard reference alignments, and thus should not serve as premises to make any argument with respect to loss of precision. The FMA-NCI example in particular, should not be singled out as an example. If the authors do wish to reuse my argument, it would be better placed in the LargeBio evaluation section than here, and I would appreciate it if the authors rewrite the argument in their own words, as well as cite AML's OAEI 2013 paper, where the argument was first made.
I agree with the authors' premise that, in general, BK is used with the goal of increasing recall, and that even with high quality BK resources such as ontologies, some loss in precision is expectable. But going from that premise to the conclusion that "the selection of final mappings is more challenging in BK-based matching" takes a leap of faith.
In my view, adding a BK-based matching algorithm is no different than adding a non-BK-based matching algorithm, in what concerns mapping selection. Evidently, the greater the number and diversity of matching algorithms you want to combine, the more challenging it will be to combine them. That, however, does not mean that BK-based matching generally requires special consideration when it comes to mapping selection. Again, if the authors have compelling arguments or specific examples to the contrary, then they should detail them in the manuscript.
The finishing statement that "new final mappings selection strategies may be developed too" is bereft of substance - again, one could say this about virtually any type of strategy. It should be the role of a survey manuscript to provide actual and useful examples and/or propose concrete ideas for future development of the field in question.

-7.5. Extending the use to other BK resources types in OAEI tracks
I was surprised to notice that the authors adopted verbatim my comment from the previous review. Together with the argument in 7.4 and the definition of BK in 2.2, which I didn't mention previously, there are three counts of this conduct, which borders on plagiarism. While serving as a reviewer, I don't mind in the slightest that authors adopt my arguments to improve their paper if they so see fit - that is one of the functions of a reviewer after all. However, I fully expect that such adoptions be made in the words of the authors, not in mine, unless I have given express permission otherwise.
While I was OK with the OAEI-specific paragraph within the Discussion, in the previous iteration of the manuscript, transforming it into one of the sections of an agenda for open research is taking the OAEI focus too far.
The authors state that use of non-bio BK has already been the subject of a variety of research efforts, so there is no grounds for considering it at large as an agenda item for open research. I don't fully understand why the OAEI would merit special focus, or how the authors expect to extend the use of BK resources to other OAEI tracks. None of the OAEI tracks were designed with the use of BK in mind, it merely happened that for some tracks, there were suitable BK resources readily available whereas for others there were not. To the best of my knowledge, the use of BK was never discouraged in any OAEI track (save for the particular case of the UMLS Metathesaurus in LargeBio, for the reasons previously stated). If BK resources aren't used in more tracks, then one could assume it is because no suitable BK resources have been found for such tracks.

- 7.6. Evaluation Benchmark
This section is, like 7.3., not contentious, but also not very clear or well substantiated. I fully agree with the authors that having benchmarks for evaluating non-equivalence mappings is a critical challenge for ontology matching, and that, as an independent point, BK-based matching is one of the most promising strategies for finding non-equivalence mappings. The authors should explain more thoroughly why these two points are so, and clarify that they are independent points.

- 8. "In which cases is the use of the BK resources relevant and necessary?"
The first three sentences are anything but answers to the question - are the conclusions really the place to debate the reasons behind YAM++'s success?
Also, my comment about "marriage of necessity and opportunity" that the authors adopted verbatim in 7.5. would fit better here than in that section.

-8. " Does the use of properly selected BK resources will lead to a simplification of the ontology matching methods?"
Apart from the poor grammar of the title, which remained unchanged despite specific criticism from reviewer 3 , the text in this section needs to be rewritten for clarity.

Grammar and word-usage issues:
- Use vs. usage: the authors use "use" at least two times when "usage" is meant

- Compound nouns: combinations of nouns the form "BK resource NOUN" where the last noun can be "selection", "domain", "types", "usage" (rather than use), etc, are compound names, and in general only the last noun in a compound noun can take the plural form - resource should therefore be singular.

- "one or several" -> "one or more"

- Subject-auxiliary inversion in interrogative statements: most of the statements where this wasn't observed, as mentioned in my previous review, have been corrected, but two still remain.

- Number agreement: there are multiple phrases with singular subjects but plural verbs or vice versa.

Review #4
By Pavel Shvaiko submitted on 28/Aug/2017
Suggestion:
Major Revision
Review Comment:

The topic is pertinent and is worth further investigations. The paper is well-written and organized. Though, unfortunately, in the current form it does not deserve a journal publication.

Section 1 provides an effective introduction, explaining the problem tackled and posses pertinent research questions.

Sections 2 and 3 bring nothing substantially new with respect to the state of the art. The main novelty appears to be a grouping of three sub-tasks, see Fig.1 (already identified by other authors in the past, see e.g., [5, 33, 54]), under the heading of “BK resources use”. Hence,
Section 3, being a set-up for the forthcoming review of the literature, is by far not inspiring, and rather rehashes what already exists, which is a major weakness for a survey paper.

Introducing an early motivating example from a selected application domain, later used as a running example to explain the reviewed methods (Sections 4 and 5) would improve the presentation of the paper. Sections 4 and 5 discuss some recently appeared works, e.g., [16, 43, 49], which were not surveyed previously elsewhere, which is good, though not sufficient on its own to justify a journal publication.

In Section 5, Figure 4 called “classification of BK resource use methods” appears to be rather ad hoc and introduced with insufficient supporting arguments, e.g., what were the criteria used to create it?, how does this classification scale? This represents another major weakness for a survey paper. Without solid classification framework, being a reference, the review of the relevant literature does not have much value as such.

Evaluation of Section 6 is weak as it focuses only on one domain, involving anatomy and biomedical ontologies. Considering more test sets from more domains would render the comparisons more robust and strengthen the practical significance of the paper. The selection of systems used for the evaluation is too narrow with respect to the approaches reviewed in
Sections 4 and 5, undermining integrity of the whole work. Furthermore, the actual evaluation results largely rely on what has been produced by the OAEI campaigns in the last years, the contribution of the paper, being a meta-review over the results produced elsewhere, is marginal. It should be made clear what exactly was borrowed and what is new at the technical level.

Section 7 attempts to draw the open challenges for further research in the area. It is not the first or the only work trying to do this exercise: it is not clear what is the relation with the work on ontology matching challenges in [47], e.g., does the submission provide only extensions of the previously identified challenges or identifies also new ones?
Section 8 does not provide sufficiently convincing answers on the originally posed questions of section 1. For example, there is no direct answer on the question “in which cases is the use of the BK resources relevant and necessary?” and there is only some selected evidence provided in the form of a discussion. In overall the conclusions are rather straightforward, without providing substantial take-away insights.

Review #5
By Jérôme Euzenat submitted on 14/Sep/2017
Suggestion:
Major Revision
Review Comment:

This is not a review per se. I had to read the paper to have an opinion and these are my notes. It is totally biased by comparing to the work on which I had participated and I know better. However, I think that this can help the authors to improve their paper.

The paper provides a new characterisation of what it calls 'background knowledge-based ontology matching' with some variation from previous attempts. It then concentrates on two of the rough tasks in the proposal (selection and use) and describes the solutions used by various systems. Then it proceeds to a comparison of systems based on OAEI campaigns along two OAEI tasks.

* General workflow

The paper provides a 'General work-flow of background knowledge-based ontology matching' in Section 3.
This workflow is poorly compared to the already published more detailed workflow of [5, 32]. Indeed, it groups:
- management, contextualisation and selection as selection,
- contextualisation as anchoring,
- local inference, global inference and composition (and maybe aggregation) as 'candidate mapping aggregation and selection'.
Thus, a major difference between the two work-flows seem rather to be that [5] does consider anchoring before selection while this paper considers anchoring after selection. There may be good reasons to do this, but this has to be discussed. Moreover, this seems to be contradicted in Section 4 by 'The selection is then controlled by the number of anchors that have to be found between the ontologies to be matched and a given candidate ontology': it seems that selection depends on anchoring which is _not_ covered by the 'general framework'. So this should be clarified.

The difference between selection and use which is said not present in [5], seems actually very clear by drawing a line after selection: once the ontologies selected they may be used.

Remains the difference that [5] concentrates on ontologies, though this paper aims at covering other resources such as text as well. In this sense, this is a generalisation.

Providing an alternative view is good, put it should be put in perspective of existing work, in particular explaining why it is needed and what is different.

Also problematic is that Section 3 is supposed to be a 'general framework', but it is unclear that it is able to cover what exists and is described after (see the remark about Section 4 already), especially because it is not very precise. Obviously, this matter is very complex, this is why precision is paramount.

* Classification

Section 5 classifies systems in two ways:
'methods using specific algorithms and applying inference rules' and 'methods using SAT-solvers'. Since a SAT solver is a specific algorithms, this seems like a very artificial separation. It is possible that it is meant to separate specialised algorithms and general purpose algorithms. In such a case, (a) it is better to do it in this way, not limiting to SAT solvers, and (b) indeed other alternatives may be considered: [5] uses reasoning with algebras of relations, which is not specific, and anticipates using reasoners in networks of ontologies which is also general purposes.

Actually, it seems that 5.1.1 focuses more on the resources than the algorithms and 5.1.2 considers one specific algorithms. Why? Both resources and algorithms should be independent. It is often difficult to take them apart, but an effort should be made in that direction (imagine that S-match could use something else than WordNet).

What is stunning indeed, is that the 'Derivation algorithms' box in Figure 4 is not discussed much, while this may be were algorithms differ much and they may not. It is also not clear that what is described in [5] resort to combine then derive or derive then combine distinction. In principle, it will perform derivation within resources and across resources, then aggregate all correspondences. It feels that the aggregation corresponds to the 'combine' of this paper, so it is true that it is done in the end, however, the resources have already been used together during the 'derive' step.

* Exhaustiveness

Some alternatives are not discussed not evaluated. For instance, [5] compares several ways to use the resources by studying the impact of different types of composition and path length. Since, like many work on context-based matching, it has not participated to OAEI, it is difficult to compare them in Section 6, but it would be worth discussing them in Section 5.

* Bias of the results

It is good to provide comparison of approaches and this is very difficult in survey papers.

This one does it by relying on OAEI evaluation results. This is a good idea, however, it has important limitations that are not stressed enough:
- not all presented tools, by far, are compared due to this restriction to OAEI: only 5 systems participated in OAEI (including two version of LogMap) while 16 systems are reported in Table 3 (and it even seems that one of them, XMap does not appear in the table
- the relevant tracks of OAEI rely on only one domain, biomedicine, hence there is a high risk that the results cannot be generalised outside of this domain,
- most importantly, concerning Largebio, the results are very difficult to interpret, and this has already been raised in the context of OAEI, due to the use of MeSH both for establishing the reference alignment and as a background resource. [[Reading another review, I understand that this comment may not be fully accurate]]

* Comparison

The conclusion of Section 6 is not that of a real comparison between such systems. Actually, the comments seems to rather globally compare the best system without BK and the systems with BK rather globally. There is no interesting discussion about the benefits of such and such approach identified in Section 4 and 5 for instance.

This is also visible in the agenda for future research. Section 7.4 is unconclusive: some mappings are missing but this does not mean that this is incorrect. A serious work on the topic would have looked into these results and try to compare the behaviour of the systems individually. Here we have a vague global statement stated that more research is needed.

The paper reads 'Currently, the identification of the BK resources is done manually'. If this were true, that would be a problem also for the proposed framework: suddenly half of the framework would have never been implemented... However, Table 1 shows that this statement in incorrect: there has been work on that for quite a while and at least half of the systems used in the OAEI comparison performed automatic selection.

* Minor comments (including some plain mistakes)

- general: the term 'formalized' is used a bit too much and too freely.
- Since the paper is a survey, its title should rather be 'A survey and comparison of background knowledge-based ontology matching' it is not like if this 'background knowledge-based ontology matching' was something new introduced by the paper.
- p2, Section 2: In this section / Section 2.1. In this section
- 2.1: Definitions are said to be taken from [32]. However, [32] does use the term correspondence instead of mapping and a matcher there is certainly not defined as combining similarity measures.
- 2.2: 'background knowledge' If this is the main topic of the paper, the dictionary is of little help. Just like wordnet is not good for specialised vocabulary of medicine, 'the dictionary' is not good for the specialised vocabulary of semantic web research.
- The reference alignment is _not_ the set of true positives. The 'true positive' term is related to the output of a matcher (or whatever is evaluated with precision and recall) while the reference alignment is independent of any matcher.
- p3. It is funny that the text criticises [5] by saying that selection is a step and not a task, and that it carries on by 'we propose a [...] BK-based ontology matching workflow by defining two main steps: (1) BK resource selection'. Actually, it is unclear what difference it makes to be task or a step.
- Definition 1: the condition that F(A')-F(A)>0 is useless. The definition is unclear because many of the terms used there are placed before the definition. The process is also unclear because it is written that A is obtained without using the BKR but nothing is said of SR. In principle, BKR should not be defined out of the definition because the definition is sufficient with 'a subset of SR'
- Definition 2 does not seem to be a definition of BK resource selection since it makes use of the Definition 1 which rely on F-measure based on the reference alignments. I assume that the reference alignment is not available to the matchers so they cannot perform the task this way. It rather seems like a definition of a measure of the quality or the usefulness of the selection step.
- resources selection -> resource selection
- It is also unclear what the link between the BK resource selection process of Definition 2 and the BK resource selection step that comes in the paragraph after since they are described differently.
- end of 3.2: it is unclear how the described methods, which do not simply select resources but reduce and combine them can be simply described as a 'selection' operation: that would be another difference with [5] but this also calls for a renaming of the selection step.
- p5: in ontology matching process -> in ontology matching or in the ontology matching process
- 'The use of BK resources in ontology matching is also called BK-based matching' where is it called this way? Could you add a reference of write: We call...
- 3.3 uses a lot the term 'semantic' (e.g., deducing semantic relationships). It would be good to be more precise about what a semantic relationship is and how it is defined. This becomes specifically relevant when it is written 'is inferred from the structure of the useful BK resources' and 'the term structure refers to the semantic relationships [...'. This is very confusing. Composition is also not defined. It can be defined in various ways leading to different results hence it is useful to be precise. Figure 2 only shows a trivial derivation.
- 'indirect matching are sometimes represented as three-tuple': any reference to provide of where this is given? Moreover, it seems to be mappings and a three-tuple is a triple.
- p7: 'Indeed, the method prioritizes ontologies that have a rich structure': that seems interesting, but how do they do that?
- p8 'Other statistics are not used for selection even if they seem to be interesting criteria': what do you mean?
- 'validated semantics': what is this?
- 'BK resources use methods' is ugly. And, yes, please avoid acronyms
- 'poor semantics': please clarify
- p11: 'the explicitation phase can be compared to the anchoring step in the common workflow' maybe, but why isn't it compared in the paper?
- XMap is not in Table 1: this is strange
- LogMap ontology matching system -> the LogMap ontology matching system or LogMap
- UBERON ontology -> the UBERON ontology
- Fig 5 (and 7, etc.), the plot is difficult to read and does not show the continuity of the systems (how a system has evolved), this should have been better rendered as a graph (one curve per system/one curve for the performance of the best system).
- the several questions in Conclusion are provided with quite unclear answers (for the last one) or even do not answer the question at all for the first one.


Comments