Review Comment:
Summary
The paper describes CANARD, a a tool that creates complex mappings between populated ontologies based on the notion of Competency Questions
for Alignment.
The paper describes the approach in detail and present an evaluation based on two different tasks: the complex alighment of the popular Conference ontologies, and also of the Taxon dataset which covers species of plants.
CANARD has been previously published, so this review also focuses on the degree of novel content this submission has in comparison to the 2020 ISWC publication, and the overall evaluation approach.
Motivation and Introduction
The paper presents a strong motivation for the problem. However, the wording of the two hypotheses in the introduction is not clear. In fact, neither points are hypotheses. The first one is close to one, but lacks the explicit proposition, while the latter one is simply not an hypothesis in any sense. While I can infer to what the authors are referring to, this needs a complete rewording and a clear exposition of what the hypotheses actually are.
It is not entirely clear that it does not actually present CANARD, but rather expands on a previously published work at ISWC 2020. Although this is alluded to later in the text, this should be more clearly stated.
Methodology
The methodology is presented in a detailed and well-structured fashion.
However, I am curious about a few aspects:
1. There is no clear definition of what a "support answer" is.
2. In section 4.4, it is not clear to me why a threshold for the Levenshetin similarity is needed. Later on it is mentioned this is because of noise.
3. In Equation 4, the sum of labelSim and structureSim adds up to 1.5, since labelSim is in [0,1] and structureSim is set to 0.5 or 0. Is this correct? I was expecting values of similarity in [0,1]. Why this unusual definition of similarity? And then later "When putting the DL formula in a correspondence, if
its similarity score is greater than 1, the correspondence confidence value is set to 1." This means that a Levenshtein similarity of 0.5
4. There is not a lot of detail on the computational complexity of CANARD. There is a limit on the length of paths, but this could be more clearly presented.
Evaluation
While CANARD has been evaluated on OAEI editions, which is a clear plus, many of those results do not make it into this paper, I think this really detracts from the paper, and I struggle to understand why it was left out.
1. The evaluation is based on parametrizing the many equations behind this approach with manually set values. I appreciate that some are varied in the evaluation, however, the DL formula threshold is fundamental to the final results obtained and is not contemplated. Is this because results do not vary considerably whenaltering it?
2. The different variants in Table 2 are not clearly described. The Table should include a short textual description of each variant.
2. In 5.3.2 did you limit the maximum number of answers per CQA to the threshold, or was it exactly that number?
3. The authors identify running time as a limitation of the exact label match approach. While I understand this is a limitation of their implementation, it is not a limitation of the approach in itself. This large time is probably due to inefficient data structures and multiple class to the SPARQL endpoint and could be considerably reduced. Many OM systems (ALIN, LogMap, AML to name a few) perform exact label matching on ontologies with thousands of labels in a matter of seconds.
4. In 5.4 authors describe the alignment data sets they are comparing their approach to. While it is easy to understand that Ritze and AMLC are the results of complex matching approaches, it is not clear wat the query rewriting and ontology merging alignment sets are, and what characteristics they have. To make the paper more self-contained, it would be best to briefly introduce these.
5. The authors recongize that here is some circularity in the evaluation. The same CQAs used by CANARD are the basis of the coverage evaluation. CANARD is the only system based on the CQAs. The OAEI 202 actuallly included more complex matching tasks, on which CANARD was evaluated - Populated GeoLink and Populated Enslaved. However, this paper ommits these results. Why? I believe they should be included and discussed. While I understand these two tasks do not come with pre-defined CQAs, these results could highlight the reliance of CANARD on manually defined CQAs vs automatically generated ones. In fact, it would be great to see an evaluation for Conference, based on both the high-quality CQAs and automatically generated ones (which were made for CANARD's 2020 OM paper).
Related Work
1. I would have expected more highlight on the work of Zhou et al (ref 7 in the paper). This work was evaluated against CANARD in OAEI 2020.
Minor
p. 3: A competency questions --> A competency question
Which are the accepted paper? --> Which are the accepted papers?
p. 24: The ontologies are are populated --> The ontologies are populated
p. 28: CANARD relies common instances. --> CANARD relies on common instances.
|