Review Comment:
This paper presents a complexity analysis and algorithms for the logical separability problem in the context of OBDM: Given two sets of tuples interpreted as of positive and negative examples, one looks for a UCQ query q such that each positive example is contained in the set of certain answers for q under the given ontology and mappings, and none of the negative examples intersects with the set of certain answers.
The separation problem has previously been considered in the pure database context (when the goal is to find a database query from a particular class of queries that separates the positive and negative examples) and for OBDA. The key difference is that in OBDA, the vocabulary of an ontology directly contains the database schema, while in OBDM the mapping is given by GLAV schema mappings between ontologies and databases. These mappings make things significantly more complicated.
The key contributions of this paper are the tight complexity bounds for the separator existence and verification problems, and the introduction of two approximations of the notion of separability. A complete approximation requires that all positive examples are contained in the set of certain answers, but this set may also contain some negative examples; a sound approximation requires that the query returns none of the negative examples, but some positive examples may also be omitted.
Then the authors define the very natural notions of minimally complete and maximally correct approximations.
It turns out that the complexity of verifying if a given query is a separator is NPcomplete for complete separation, coNPcomplete for sound separation, and DPcomplete for proper separation. The complexity of checking if a proper separator exists varies from coNP to coNextTimecomplete depending on the underlying mapping language. Finally, both minimally sound and maximally correct approximations always exist, and exponentialtime algorithms are given.
The separation problem is of interest for the general KR audience, and is in scope of the Semantic Web Journal. The paper if reasonably well written, and the main concepts are illustrated with examples.
I recommend accepting the paper with minor revisions.
Some comments and suggestions.
P3, Lines 4351. A paragraph with results being listed is difficult to process. I would suggest summarising the main results of this paper in a table or a better structured list.
P6, L 10. “an Sdatabases” > “an Sdatabase” or “Sdatabases”
P6, L 13. “n is the arity q_s” > “n is the arity of q_s”
P6, L 17 and further in the text, should i=[1, p] be i\in [1,p]? If equality is intended, please introduce the notation.
P6, L26, I think the inclusion “q_S^D\subseteq \lambda^+” should be the other way, “\lambda^+\subseteq…”
P6, L31. h(C_1)\subseteq h(C_2) does not make sense. Should it be h(C_1)\subseteq C_2?
P7, Lines 2223. Please explain what PerfectRef does.
Section 4 presents a case of an empty ontology and nontrivial mappings. Could you please remind the reader what happens if the mappings are trivial, that is, there’s a onetoone correspondence between the ontology alphabet and the DB schema.
Def 3. Is a proper separation always minimally complete and maximally sounds? Please comment.
P10, L 6. “conference paper [46]”. Reference [46] is not a conference paper.
Def. 4 introduces a name clash between the case of positive/negative examples and positive only, which might be confusing.
The entire Section 4.1 is about the related work. There are no results and the definition is not used anywhere else. At least mention in the related work that a comparison with Abstraction is given in Section 4.1 as definitions are needed, and better move to the related work section.
Section 5. \sigma(x) for the size of x is unconventional. Why not use #x or x?
P14, lines 9 and 1114. Again there should be “\in” in place of “=”
P16 presents exponential algorithms for computing min complete/max sound approximations. What is known about the lower bounds on their sizes? Do you know of any particular OBDMs where separations are necessarily exponential in size? Are there any tractable cases where separations are guaranteed to be small (apparently yes, as they are considered further when analysing GAV/LAV cases)
P18 Corollary 3 states that either q_O is a proper separation or no proper separation exists. This statement immediately give a naïve algorithm for existence: compute q_O and check (Thm 2 and Corollary 1) if it is a separator. This is probably already in NEXTIME. Is your algorithm better?
The proof of Thm 8 would benefit from either moving some parts in an appendix or splitting the proof into several lemmas. Claims within claims are hard to read.
