Review Comment:
This manuscript was submitted as 'Tools and Systems Report' and should
be reviewed along the following dimensions:
(1) Quality, importance, and impact of the described tool or system
(convincing evidence must be provided).
The paper presents a capability overview and historical retrospective
on Ontop, a system for Ontology-Based Data Access. Ontop builds on the
experience on DL-lite query rewriting and previous systems for OBDA
such as Mastro. Ontop is currently the best example of implemented OBDA
systems. Onto is available under the Apache Open Source license, which
makes it particularly appealing for both academia and industry.
(2) Clarity, illustration, and readability of the describing paper,
which shall convey to the reader both the capabilities and the
limitations of the tool.
The paper is well written, and does a great job of presenting the
major characteristics and software APIs of Ontop.
Some additional discussion would make the paper more self contained
and valuable:
1. The mappings in the examples are essentially GAV. Skolem symbols
(URIs) are conveniently generated by using the values provided by the
data sources (e.g., :db1/{pid} in Example 2.2). However, when
integrating multiple sources, there may not exist so conveniently shared
ids. What is the Ontop approach to deal with more general schema
mappings (i.e., LAV, GLAV mappings, with existential variables in the
mapping consequent)? What happens when there no shared ids across
sources (e.g., in one source employees are identified by employee_id
and in another by SSN)?
2. How these more general mapping rules interact with the compiled
T-mappings.
A full discussion of 1 and 2 may be beyond the scope of the
paper. However, the authors should include a few sentences describing
precisely the limitations of their mapping language and
algorithms. (Without the reader needing to go to the references (e.g.,
[34]).
3. The industrial applications of Section 4 seem to be in their very
initial stages, but do raise some questions about the applicability of
the approach. In particular, both involve large sources. Can you
discuss your approach to generate the needed schema mappings? Would
you model all the source tables? Is there some (semi-)automatic help to
generate the schema mappings? Based on previous experience, can you
provide an estimate of the effort (manual and/or software-aided) needed to
model the large integration use cases like those in Section 4.
Minor comments:
- In the mappings of example 2.2 and in page 6, the stage attribute is
not needed in the SQL queries. It is not used in the mapping rule
consequent.
|