Review Comment:
Overall, the paper has been considerably improved and is easier to follow (in the technical parts, e.g., I like Example 5). There are still a few issues but I believe that none of them will constitute a major problem.
1. One issue that caused some misunderstandings in previous reviews and that still persists in the paper but also, e.g., in the some replies of the cover letter it seems is the matter of backward-chaining with and without pre-materialization. It seems to be agreed on the fact that backward-chaining can be done without pre-materialization (on the original rule set of course), while pre-materialization is introduced as a significant improvement for efficiency (which is why it is integral part of the implementation). Why not split Sects. 3 and 4 accordingly and adjust all related descriptions to avoid some confusion? I would suggest to restrict Sect. 3 to backward-chaining alone and join all material on pre-computation (including 3.4) into Sect. 4. Then the following text parts can be simplified/improved:
- The last paragraph in Sect. 2;
- The introductory paragraph on Sect. 3;
- The paragraph before the last in 3.3: you could say the same about termination and soundness, I would suggest to simply drop it to avoid any confusion, since you are not saying anything new/surprising anyway (backward-chaining is correct; if you introduce pre-materialization, then you have to ensure that it is correct as well, but that is completely irrelevant to the discussion/correctness of backward-chaining alone - sure, you are not saying it is relevant, but the way the phrase is put starting with "However", it confuses things without introducing anything beyond the obvious);
- The beginning of Sect. 4: the purpose of pre-materialization is certainly not what is claimed there;
- The beginning of Sect. 5: "pre-materialization procedure which ensures the completeness of our approach" - again, this is not true: the procedure improves efficiency but you could have programmed QueryPie entirely without it, and it would still be complete (but much slower, I guess).
2. While the proof structure is now fine in my opinion, there are still some details that have to be corrected:
- Lemma 1, item 1 apparently also shows p \circ Var(r) \subseteq V' but that is not mentioned in item 1;
- same Lemma, item 2, is proven by induction, however, if the induction hypothesis is that infer(Q',infer) where Q' \in \mathcal{Q}, then nothing needs to be proven (note that the second argument of infer is wrong);
- Lemma 3 is proven by induction without specifying on what;
- The hypothesis for Lemma 3, is written in a way such that it actually suffices already, i.e., it is that strong, that nothing else needs to be proven, which is certainly not the intention in a proof by induction, the problem seems to be related to spelling out how the induction proof actually proceeds;
- "As remarked below the definition of joint…" - I cannot find that, is that really meant to be there?
- The restriction on Datalog ("requires all variables of the head to be covered by some atom in the body") seems crucial to me and should be mentioned right away in 3.1 and not only in the proof of a Lemma!
3. Overall, I think the evaluation section is fine apart from one issue that I do not understand which is related to 6.2.2. QueryPie is written in Java, you have the source code, so I naively wonder, how it cannot be possible to pickup the source code, comment all calls to pre-materialization (and other optimizations based on that) away and use backward-chaining directly thus creating a correspondent to Table 5? You loose all the pre-materialization time, but query responses are most likely worse than there, but this way the comparison is much more meaningful than Table 6. Then, you could also show whether the idea on how to pre-materialize is better than the naive solution as described in the beginning of 4.1. Again, if this has been done in the past, then it is sufficient to cite, otherwise some experimental data would be really nice and most likely be really convincing.
Some further minor suggestions/mistakes/language-related issues (with the usual indicator page, column):
- 2, l: "Our method relies… to calculate the inference." - Which inference?
- 2, l: it still says that "OWL RL fragment is the most recently standardized OWL profile designed to work on a large scale". Now, does this mean the other profiles were developed earlier or do not work on (such a) large scale, or is the phrase simply misleading and should better be rephrased?
- 2, l: "We thus tackle several problems. The first is to show that our reasoning algorithm…" - I guess, you mean the adapted backward chaining algorithm, which should be spelled out explicitly. Then, no second or further problem is mentioned, so one wonders what happened to the several problems. Instead it is said that the overall method is applied (the one from a paragraph before) without mentioning any related problems. Then, however, "we have implemented these techniques…" and the reader may guess that you mean the two steps of pre-computing and then backward-chaining which was mentioned two paragraphs before (and not as techniques). Again, that is not a problem of content but of writing down a sound outline of what you do/are going to present.
- 2, r: Please add a pointer to Sect. 5 in footnote 1 regarding the OWL RL rules you do not cover.
- 3, l: Before, you establish that your approach combines queries and (pre-)materialization. Then, "…we use backward-chaining in both instances…" and further "…a backward-chaining algorithm which exploits…that some triple patterns are pre-materialized…" I know what you want to say, but currently, it might leave the impression of a loop - before backward-chaining you do backward-chaining - please rephrase.
- 4, r: You might consider adding a small example to joints of substitutions, similar to Example 2 and 3.
- 4, r: Better use R instead of R0 and R1 when talking about unifiers. If R0 and R1 are indeed different, then you cannot unify them anyway.
- 4, r: In the Datalog query definition, Ri(ti) is an atom for each i in {1,…,n}, not {0,…,n}.
- 4, r: Is it intended that an answer is first defined as a tuple, and 4 lines below as atoms?
- 5: Var(r) is defined twice, once on the left, and once on the right.
- 5, l: "T_P(I) is I extended by all facts that could be inferred…" - This is related to my previous request of unifying the tense, where I failed to explain properly what I meant. Sure, all facts could be inferred, but they still can, i.e. Simple Past should be used for describing things that happened in the past and are finished, I believe for such technical descriptions Present is more suited. Similar, p.7, l: "We introduced", "chose to"; p.10, l: "since \delta_h was the unifier"; p.11: "R'(b) was an answer"
- 5, l: It is o.k. to define graphs and then trees, but since all you need is trees, would it not be enough to only introduce trees?
- 5, r: "otherwise it the maximal height of all trees, where … plus 1." Please switch "plus 1" before the comma.
- 5, r: What is meant by arbitrary trees, for which the height is not defined - infinite ones?
- 5, r: Please avoid pointing to Algorithm 1 and line 13 when defining lookup: that is the occurrence of usage, but the function should be general, so it is fine to define it in general without a concrete pointer to some usage 3 pages later.
- 6, l: You introduced the convention in the beginning of 3.1 for variables, here "?a" reoccurs. I guess you want to unify notation here as well? Also page 18 contains numerous occurrences of variables with "?".
- 6, r: The two rule applications cax-sco and pro-spo1 are both wrong in Fig. 1: the children in the third line should, e.g., be T(s,rdfs:subClassOf,Person), T(x,ref:type,s), T(y,rdfs:subPropertyOf,rdf:type), and T(x,y,Person) (which needs to be adjusted in their children as well); by the way, since the caption says "Fig. 1" you might consider referring to it by Fig. 1 instead of Figure 1.
- 7, r: I cannot see any reason, why Lemma 1 is presented before explaining Alg. 1, please move it.
- 7, r: You certainly mean A to be all atoms occurring in P union {Q} not unified with {Q}.
- 8, l: \lookup (twice) and \Mat is probably not intended this way in Alg. 1.
- 8, l: Lemma 1 sounds rather general. What I mean is, it sounds like something that was already there for previous instances of QSQ. If so, then this should be mentioned.
- 9, r: "discuss the correctness of our algorithm, which are termination, soundness and completeness." - please rephrase; also "we will furnish further lemmas and definitions" sounds a bit arbitrary: I think it would be better to say what you introduce (and briefly with which purpose).
- 9, r: Minimal model in Lemma 3 has never been used before as a term.
- 10, l: Termination is fine, but I guess Lemma 1 and the fact that \mathcal{Q} is finite would suffice.
- 10, r: The phrase on "New in line 5" should probably appear further down in the paragraph and not as first statement regarding Soundness.
- 11, l: In Prop. 1, you mean Q defined above Lemma 1, not 2.
- 12, r: "rule set… different from standard OWL" - do you mean OWL RL or are you saying there is a rule set for full OWL 2? I cannot make the logical step to "Our approach thus relies on a pre-materalization phase…" - the two issues seem completely unrelated.
- 13, l: End of proof Prop.2, it should be "R(a) \not\in \bigcup …", right, because that is the contradiction to the initial assumption?
- 13, l: In Algorithm 2, line 7, the condition i>0 is missing, otherwise the head is also overwritten, which apparently is not intended.
- 15, r: It is probably better in the induction step to say "either R(a)\in T^{k}_P(I)" than T^0_P(I).
- 16, r: In the caption of Table 2, better use pre-materialized instead of precalculated.
- 17, r: "Using Proposition twice more" - please correct.
- 17, r and 18, l: "First type of duplicates source" - how about First source of duplicates? Similar the second one.
- 18, l: the INV rules are wrong: if S_{INV}(p, INV, q) (or vice versa) and T(x,q,y), then
T(y,p,x), and not T(x,p,y).
- 22, Table 7: I do not understand what "queries changing the rule set" means in the caption.
- 23, l: "relatively modest computer architecture" is maybe not the best description for the DAS-4 cluster?
- 23, l: "all the features of the OWL 2 language". I guess, you mean OWL 2 RL?
- 25: Inconsistent capitalization of names in the references (OWL, RDFS,…) - please check them all and unify (to the capitalized versions, I suppose).
Some typos:
- 3, r: algoritms;
- 5, r: "leads form" - from;
- 6, r: contruct;
- 7, l: redudancy; dept-first; "drawback that THEY are very difficult to BE implemented; QSQ require - requires; "termination soundness" - comma missing;
- 8, r: procedure; "that were inferred in the process, AND saved in the global…";
- 10, r: assingment (twice);
- 11, l: Queries - queries
- 11, r: "If there is a Datalog proof tree…, then" - comma missing before then
- 12, r: query (a TYPE, w) - should be "u" instead of "w" according to Ex. 5; "to the variable w so that in A subsequent call…" - article "a" missing;
- 13, l: harmeless;
- 13, r: "where I0 IS the database I…" - "is" missing
- 14, r: "satisfies these condition…" - conditions
- 15, r: "…it is the rule where Ri(ti) has been substituted…" - better "a rule"
|