A Closer Look at the Semantic Relationship between Datalog and Description Logics
Submission to RR2010 special issue.
Resubmission after "major revisions", now accepted for publication.
Second round reviews:
Solicited review by Riccardo Rosati:
The revised version of the paper seems to have addressed all the major comments by the reviewers. I believe that the paper should be accepted.
Solicited review by anonymous reviewer:
I think the authors have addressed the issues and have improved the paper. I suggest to accept it in its current form. Just one remark: I still find it strange that Section 7 builds on a technical report, i.e. material that has not been peer reviewed.
Solicited review by Marie-Christine Rousset:
The authors have taken into account the reviewers comments in a satisfactory manner.
First round reviews:
Solicited review by Riccardo Rosati:
General comments:
This paper presents a study on the relationship between Description Logics and Datalog.
In particular, the paper focuses on the problem of identifying "maximal" fragments of Description Logics that can be expressed in Datalog.
The paper takes inspiration from previous work on the description logic DLP, and presents the following contributions:
- The first part of the paper (Section 3 and Section 4) is devoted to the introduction of a formal notion of maximal DLP fragment of a given description logic.
- Then, in Section 5 the Description Logic ALC is considered, and the description logic DLP^{ALC} is defined. Then, it is shown that DLP^{ALC} constitutes the largest DLP fragment of ALC according to the definitions provided in the previous sections.
- Finally, in Section 6 the authors briefly discuss an analogous definition of the largest DLP fragment of the DL SROIQ^{free}.
The topic studied by the paper is relevant. I find the general approach presented in this paper original and quite interesting: however, some technical assumptions and definitions (e.g., the assumption of closure under variants, the choice of semantic emulation vs. syntactic emulation) are not very intuitive and somehow leave the impression that the approach needs some slightly artificial assumption to make it possible to identify largest DLP fragments of a description logic.
The paper presents a set of original results (to the best of my knowledge). I find the material in Section 3, Section 4 and Section 5 quite significant. On the other hand, I find the material in Section 6 less significant, and even not acceptable in its present form (see specific comments). Anyway, I believe that the results of Section 3-5 alone are sufficient to deserve acceptance.
The paper is generally well-written. Main related work in the area is cited (to the best of my knowledge).
Summarizing, I recommend to accept the paper after a minor revision in which the authors address the specific comments below.
Specific comments:
p.3: "data-complexity and combined complexity of Datalog, Horn-SHIQ, and Horn-SHOIQ agree": the difference between data-complexity and combined complexity should be described.
p.5: "It is easy to see that semantic emulation implies syntactic emulation", followed by the footnote: "The converse is not true in general, but this fact is not essential for our work so we do not discuss the details." But, right after that, the authors informally explain semantic emulation using the characterizing property of syntactic emulation. So, the reader might wonder why semantic emulation instead of syntactic emulation is considered in the rest of the paper.
p.5: A proof of Lemma 3.4 should be added.
p.8: A proof of Lemma 5.4 should be added.
p.8: A proof of Lemma 5.6 should be added.
p.9, Theorem 5.8: "...one can construct a Datalog program dlg(alpha) that emulates alpha": it should be specified that this is semantic emulation.
p.11, after Theorem 6.1: "Rather than in the concrete description of this fragment, we are interested here in the general insights that are obtained from proofs of such results." Conversely, I believe that most readers at this point would be very curious to see a concrete description of such a largest DLP fragment of SROIQ^{free}, so I recommend the authors to add such a description. Without more details on the fragment, the purpose of the whole section is not clear to me.
Solicited review by anonymous reviewer:
The paper presents some results on the relationship between Description Logics (DLs) and Datalog, and in particular on expressing DL ontologies as Datalog programs. This is certainly a relevant topic because Datalog reasoners are quite efficient.
In the paper the authors define the notion of a DLP fragment D of a DL L, which is a fragment of L that can be expressed in Datalog. The authors apply certain closure requirements (closure under renaming), which is a good idea (these requirements were not present in the original proposal of DLP). The authors then show that a unique maximal DLP fragment of DL ALC exists and can be defined by a context-free grammar.
Other 2 results are:
-Showing that there is exist no maximal DLP fragment if we replace the closure condition by the requirement of polynomial time recognizability of the fragment.
-Consistency of DLP KBs can be decided in polynomial time for KBs with bounded axiom size. (Note to authors: please clarify how you define the size of an axiom. Is it the number of symbols/constructs in an axiom, ignoring the size of the representation of concept/role names?)
The author also provide some hints on how the results can be extended to SROIQ.
I see two issues, which need to be resolved: correctness and presentation.
*Correctness*
I think there is a problem due to which the proof of the main Theorem 5.14 does not work. Take the concept C := neg A sqcup exists R. B. According to the grammar in Figure 1, C is not an L_H^{A} concept. Due to Definition 5.11, qe(C)=neg A sqcup B. According to the grammar in Figure 1, qe(C) is an L_H^{A} concept. This contradicts Lemma 5.13, which says C is an L_H^{A} concept iff qe(C) is an L_H^{A} concept.
*Presentation*
It was difficult to read the translation from axioms to Datalog programs. In particular, the quite central Figures 2 and 3 should be improved as it took the reviewer a lot of time to figure out how the figures should be read. Also, sometimes the authors use L_H^A as a concept and sometimes as a set of concepts.
Another issue is Section 6, where the authors discuss how their results extend to the DL SROIQ. The level of detail in this section is not sufficient for a journal publication. The authors in fact refer to technical report for the details, but this isn't appropriate for a journal publication. The section should be extended to make it self-sufficient.
Some minor comments:
-"Datalog rules are arguably the simplest form of a logical rule language." What about propositional Horn rules?
-"...in an ExpTime compilation process...". Please be more careful when choosing words: ExpTime is a class of problems.
-"A definition of pi may be found in [21,Figure 3.4.]" I would advise to define pi in the paper.
-your definition of a rule does not allow empty heads. Thus "we use bot to denote the empty head" sounds strange.
-in Definition 4.2.(2) I guess one should say that the transformation function is a computable function.
-the current Definition 5.1. actually rules out role assertions from LP_ALC.
-usually Datalog programs involve safety (every head variable occurs positively in the body). In fact, the current reasoners work only on safe programs. However, the presented translation into Datalog may lead to programs that are not safe. This is an important aspect that needs to be discussed in the paper.
Solicited review by Marie-Christine Rousset:
Datalog and Description Logics are two fragments of FOL that have been extensively studied separately and also in combination.
Some attempts have also been done to find the "intersection" of these two languages.
This paper clarifies this notion of intersection and more importantly charaterizes the DL fragments (called DLP) that can be expressed in Datalog.
The first contribution of this paper is to provide the novel notion of "semantic emulation" to capture the right notion of semantic
equivalence, and that is the key for a generic definition of DLP fragments of a description logic.
This definition intends to be "concrete" in the sense that it imposes a syntactic restriction that is the key for checking membership to the
fragment without complex reasoning. This point is far from being obvious: in particular it is shown in the paper that imposing
this membership to be polynomial would entail that no maximal DLP fragment exists.
The second contribution is to exhibit the largest DLP fragment for ALC. This requires the use of additional concepts to emulate some forms of Skolemisation.
It would have been interesting to show an example of an ALC GCI that is not in this fragment.
Typo: in the proof of Lemma 5.12, the indices $I$ of the two existantial and forall interpretation in $I-1$ should replaced by
$I_1$.
The paper is quite technical but very clearly written.