QueryPIE: Hybrid Reasoning With The OWL RL Rules

Paper Title: 
QueryPIE: Hybrid Reasoning With The OWL RL Rules
Authors: 
Jacopo Urbani, Robert Piro, Frank van Harmelen, Henri Bal
Abstract: 
Both materialization and backward-chaining as different modes of performing inference have complementary advantages and disadvantages. Materialization enables very efficient responses at query time, but at the cost of an expensive up front closure computation, which needs to be redone every time the knowledge base changes. Backward-chaining does not need such an expensive and change-sensitive precomputation, and is therefore suitable for more frequently changing knowledge bases, but has to perform more computation at query time. Materialization has been studied extensively in the recent semantic web literature, and is now available in industrial-strength systems. In this work, we focus instead on backward-chaining, and we present a general hybrid algorithm to perform efficient backward-chaining reasoning on very large RDF datasets. To this end, we analyze the correctness of our algorithm by proving its completeness using the theory developed in deductive databases and we introduce a number of techniques that exploit the characteristics of our method to execute efficiently the OWL RL rules. These techniques reduce the computation and hence improve the response time by reducing the size of the generated proof tree and the number of duplicates produced in the derivation. We have implemented these techniques in a experimental prototype called QueryPIE and present an evaluation on both realistic and artificial datasets of a size that is between five and ten billion of triples. The evaluation was performed using one machine with commodity hardware and it shows that (i) with our approach the initial precomputation takes few minutes against the hours (or even days) necessary for a full materialization and that (ii) the remaining overhead introduced by reasoning still allows single pattern queries to be processed with an interactive response time. To the best of our knowledge our method is the first that demonstrates complex rule-based reasoning at query time over an input of several billion triples and it takes a step forward towards truly large-scale reasoning by showing that complex and large-scale OWL inference can be performed without an expensive distributed hardware architecture.
Full PDF Version: 
Submission type: 
Full Paper
Responsible editor: 
Pascal Hitzler
Decision/Status: 
Minor Revision
Reviews: 

Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-special-ca...

Revised resubmission, after an "accept pending minor revision", again accepted pending minor revisions. First round reviews are beneath the second round reviews.

Solicited review by Matthias Knorr:

The authors replied to all the questions raised in my first review, and the paper reads much better now. Unfortunately, there are still problems with Algorithm 1 (I overlooked the first time), despite the implemented fix requested by another reviewer.

Most importantly, the join on substitutions in line 21 is not defined. Indeed, it is not clear what the join of two substitutions where one is empty would be: intuitively it would be empty as well, which would break the algorithm. So please either fix the algorithm or introduce the intended meaning of joins on substitutions properly.

Example 4 does not help to clarify this issue. In fact, I believe it is a rather unfortunate choice for an example, because almost every term in the rule is a variable (in particular in the head), which means it can be applied quite a few times. Also, the example is short in details, just briefly mentioning some initial steps and then returning to only stating general phrases, such as "results will be joined with the previous ones", "will repeat the execution until all triples are calculated." - these phrases are nice when describing the algorithm but not in an example, I think. Also note that the second query should be T(x,p,y). Moreover, I do not understand what "in this case the implementation is aware of the possible p" means, all the more since p is a variable (without any restrictions on it)?

Why not instead consider the following example (or something similar) that permits to make a (brief, yet) complete run through the entire algorithm, showing, e.g., that T(P, SPO,Q) can indeed be derived?

T(x,SPO,y) <- T(x,SPO,w), T(w,SPO,y)

with database

T(P,SPO,R)
T(R,SPO,Q)

The related proofs are not complete either, I am afraid. Termination is fine and spells out the argument.

Soundness should in my opinion be: whatever is returned by function main w.r.t. the query should indeed appear in P(J) w.r.t. database J and program P. That is stated (for function infer) in 3.1.2., but, given that reference 20 is 25 years old, difficult to obtain, plus the actual deviations from the algorithm there, the current proof argument for soundness seems to be insufficient: true, new facts can only be derived from rules in the for-loop but that alone does not ensure that you do the right inferences, i.e., R(a1,..,an) is derived by infer does not mean that it is in fact in P(J), as such. What is missing, most likely, is a reference to the original algorithm/proof and some (short) argument considering the differences in the algorithm in the paper.

Moreover, completeness should be: whatever is in P(J) (as instances of the given query Q) is returned by the algorithm. Proposition 2 only talks about proof trees instead, while the premise should be the appearance of a considered fact in P(J). So something is missing there. Apart from that, the completeness of Algorithm 1 alone does not depend at all on the pre-computation (see page 8), which as far as I understand literally only creates a different (but corresponding) set of Rules+Database - you prove that you can do these modifications without changing the consequences (in Sect. 4) but it should not affect Algorithm 1 as such, and, thus, should not appear in the argument of completeness of Algorithm 1. The same flaw in the argument appears in the beginning of Sect. 4 again: As far as I can see it, everything would work correctly without pre-materialization, it is only an optimization, even though a very important one: i.e., the algorithm should be able to infer all derivations even without pre-materialization contrary to the first paragraph in Sect. 4. (By the way, talking about the practical impact of pre-materialization in 3.2 before Section 4 is a bit odd, but of course not a real problem.)

I would also like to ask the authors to clarify the following issue related to Algorithm 2 and Section 4.2: Right now, Algorithm 2 is presented, then it is claimed/shown that it is terminating, sound, and correct. But it is not clear w.r.t. what. Indeed, the Prop. 3 talks about J0 (as result of Alg. 2) and J (as input?), but the phrase before only mentions J. I guess S_R(t)(a1,..,an) is meant to be in J0 as well. However, if you indeed prove this correspondence between the output of the algorithm and tuples contained in J under P, then what is proven additionally in Sect. 4.2? It seems to me that first, Sect. 4.2 should provide the theory showing that we can substitute body atoms with the newly introduced predicates without altering the consequences, then present the algorithm, and then show that it matches the expectations in the theory. That is I presume what is intended, but in the current way of presentation rather confusing for the reader.

Some further general comments:

Please consider introducing some conventions effectively: variables non-capitalized, constants capitalized, or something alike?
It is still somewhat confusing to me to use different notation for examples, could you please unify or at least comment on that in a footnote (e.g., variables with and without leading "?", formatting of SPO and alike)?
Also, consider using consistently one tense (when talking about the paper), such as present tense, instead of mixtures, such as
- 2, r: "in Section 2 we present" and "in Section 3, we will describe";
- 3, l: Footnote 1 - "that we use(d) in this paper".

Besides that, there are a number of minor issues which should be easily fixable (indicated with page and column).

- 1, r: better "in the literature"

- 2, l: to calculate the entire derivation - better all derivations?; Thus "it" has been limited - dangling pronoun - better write background chaining explicitly; In this paper, - comma missing; "trade-off" - hyphen; what do you mean by the "latest" standard OWL profile designed to work on a large scale - the other profiles were developed earlier, or do not work on a large scale, or actually neither?

- 2, r: "In principle," - comma missing

- 3, l: "backward-chaining" several times hyphen missing, please unify notation; "To give an idea how this works" - please remove comma before how.

- 3, r: edb is introduced on page 3 and 10 - could be emphasized the first time (and the second could be dropped?)

- 4, l: T_P sometimes has an argument, sometimes not, should be unified
- 5, r: TERM is first used in sf then in it (probably normal math mode) - could you please unify?

- 6, r, Example 4: "a(n) head"; "the algorithm will launch the execute the second query" - please correct

- 8, l: "ruleset"; "in first place" - "the" is missing; "retrieves the same results than" - should be "same results as"; also around subsection 3.2 the authors refer to following/previous section while actually referring to the previous subsection, that is no real problem, just a bit confusing since the reader always stays in Section 3; please decide whether you want to use pre-materialize with or without hyphen and apply it consistently (also please check precomputed and recalculated).

- 9, l, footnote 3: The second phrase needs to be fixed.

- 9, r, Algorithm 2: I would L rather call the input (set) instead of a constant

- 10, l: Formatting bug after the query - "base. under the program P." this should be removed, I guess.

- 11,l, definition l0: please, add parentheses to set the scope of "max" precisely; (here also "where the rule body was empty" - should be "is" - see above on tenses); should it not be {0} as well (instead of {1}) that is added in the first case when defining l0?

- 12, l: Please check positioning (before or after "." and ",") of footnotes and unify (also in the previous and following cases). "Those rules include the ones who" - better that instead of who

- 13, l: "with the purpose of construct such lists". Probably better "constructing"

- 13, r: "outside the scope of this chapter" - that comes probably from a different document (thesis)?

- 14, r: "On the other case" - better "In" plus a comma is missing after case

Solicited review by Peter Patel-Schneider:

This paper proposes a caching mechanism for improving the performance of
Datalog querying. The paper then goes on to evaluate this mechanism on two
large examples using a subset of the OWL 2 RL ruleset.

One problem with the paper is that the title says that it performs reasoning
with the OWL (2) RL rules. However, the mechanism and implementation does not
handle all of OWL 2 RL reasoning and the evaluation excludes some of the rules
that could have been handled. This disconnect between the up-front claims
and the actual results needs to be remediated.

Another problem with the paper is that it is essentially proposing an
optimized Datalog reasoning technique, but without comparing its results to
any of the optimized Datalog reasoning systems. This would be less
unacceptable if QueryPIE really did implement the full OWL 2 RL rules, as
they have side conditions that need to be considered. However, QueryPIE is
a Datalog reasoning engine (with a little extra to handle lists, which also
could be handled entirely in Datalog by using auxiliary predicates) and so a
comparison between its results and those of Datalog systems is called for.

The paper does not correctly characterise SLD-AL resolution (which is the
formal basis of the (fixed-up) QSQ algorithm [20]). SLD-AL resolution does
not need to remember all the previous queries. All that SLD-AL needs is an
acceptable precedence ordering in determining admissable nodes, and tree
ancestor ordering is acceptable, and is actually used in Algorithm 1.
SLD-AL resolution also does not depend on performing computations in a
particular order. All that is required is that SLD branches be constructed
in a certain way. The branches can be done in parallel so long as
non-admissable nodes can be identified. Algorithm 1 works in a way
consonant with this requirement.

SLD-AL does not require a loop at every non-admissable node. The basic
implementation strategy for SLD-AL constructs the proof tree using only the
lemmas at non-admissable nodes. When the proof tree is finished,
the lemmas are augmented. Then a new proof tree is constructed. This
repeats until no new lemmas are found.

SLD-AL resolution goes beyond the analysis in the paper. SLD-AL resolution
pre-identifies certain literals that do not need admissability analysis.
SLD-AL resolution keeps track of lemmas and uses them only for
non-admissable goals, resulting in much less work than continually adding to
the database. SLD-AL also provides a memory-intensive implementation
technique that does not need to reconstruct the previous proof trees each
time. This solution appears to be quite suited for parallel distributed
implementations, but is not explored in the paper.

The main difference between Algorithm 1 and SLD-AL resultion is that
Algorithm 1 passes back multiple results at once (because lookup can produce
multiple matches). This is a significant difference, but this has already
been explored in various places, including by L. Vieille in his work on
QoSaQ [2nd Intern. Conf. on Exp. Data. Sys., 1988].

The paper needs to fix its description of SLD-AL resolution and better
compare the methods in the paper with SLD-AL resolution. The paper needs to
update its algorithm comparision from QSQ to QoSaQ. The paper needs to
further investigate the work on memoing in Datalog. Even in the mid-1990s,
there were many papers in this area, including a paper by D. Warren in CACM,
1992.

The paper needs to be checked carefully for correct grammar. In particular,
there are many places in the paper where independent clauses are used
instead of the required dependent clauses.

The paper suffers from a large number of un- or under-defined terms. Some
are noted below.

What is a join over sets of substitutions?
What is a blocked query?
What is the domain of the database?
What is an atomic query?
What is an unfoldable predicate?

The paper should define all these terms internally even if they are
described in the literature on Datalog.

The discussion of Algorithm 1 has several flaws.

The demonstration of termination is flawed because there is no demonstration
that no new terms are generated.

Soundness is not obvious at all. There is no demonstration that the program
actually correctly goes from rule conclusions to rule premises. There also
needs to be some sort of inductive demonstration of soundness when rules are
chained.

What is a fact? Is it just an element of Database? Can a fact contain
variables? It initially seems not, but Figure 1 allows variables in its
nodes, so it seems that facts can contain variables.

Why is MGU used in Algorithm 1 instead of simply matching? It is the
presence of MGU that makes some of the properties of Algorithm 1 be
non-obvious.

Algorithm 1 has some very obvious algorithmic inefficiencies. Tmp is
accumulated throughout the run and added into Mat over and over again. Work
is done over and over again between the calls to infer from main. These may
be just sloppy coding, but then why should a sloppy algorithm be published,
or they may be vital to the algorithm, in which case they should be
explained. Also not explained is why there needs to be the overall loop.
Without an explanation the reviewer cannot see why the various choices have
been made in Algorithm 1. Also it is not possible to come up with an
intuitive understanding of Algorithm 1 so problems with it, such as the
missing definition for join, have to be considered problems in the paper.

Algorithm 1 also has a fundamental efficiency problem. Consider trying to
find the extension of a property in RDFS or OWL 2 RL, as shown in Example 4.
Because of the subproperty inheritance rule a subquery for the entire
database will be generated, even if the property has no subproperties at
all. This cannot be efficient.

An obvious optimization to Algorithm 1 is to not consider further clauses in
a rule if the first produces no results. If the authors have not considered
even such simple and obvious optimizations then the work is probably not
mature enough for journal publication. Similar optimizations should be made
if the first clause produces only a few results. These sorts of
optimizations are already present in the literature on QSQ and related
Datalog proof systems.

Proposition 1 does not make sense. Facts are not derived from queries in
prooftrees. Prooftrees in rule systems are not even unique for a particular
query. The first paragraph in the proof of Proposition 2 appears to admit
that there can be multiple proof trees for a particular query.

Perhaps the intent of Proposition 1 is that a "prooftree" is a trace of
Algorithm 1 and a fact is simply an intermediate query. But then what is
Proposition 1 trying to show? What is a *subsequent* non-blocked query?

Proposition 2 does not appear to show completeness. Without definitions for
the concepts in its statements, however, it is impossible to be sure about
anything related to Proposition 2.

There desperately needs to be some explanation as to just what is going on
in the formal development and good intuitions as to why the formal
development is done the way it is. Without such, the reviewer has no chance
to fill in the many missing pieces of the formal development and has to make
the assumption that any flaw is fatal.

The correct operation of Algorithm 1 is vital for the correct operation of
Algorithm 2 and the overall operation of QueryPIE. Before before the paper
can be re-reviewed there has to be a believable demonstration that Algorithm
1 is doing the right thing. Because there are so many problems with
Algorithm 1 any analysis of the rest of the paper can at best be considered
to be preliminary. Therefore the rest of this review is not as
comprehensive as the preceeding parts.

There are some problems with the description of Algorithm 2 that I have
noticed.

Why does pre-materialization take a list instead of a set?

What happens if the pre-materialized queries overlap?

Why is there a separate precomputation procedure? Why not just use
Algorithm 1 generalized to allow multiple queries at once.

Algorithm 2 can be very inefficient. Algorithm 1 will often derive
information that would be useful in later steps of Algorithm 2, but only the
query answer is retained.

Table 2 contains many incorrect items, including owl:maxCard and
DataTypeProp.

The excluded rules for OWL 2 RL include many important ones. The exclusion
of these rules means that QueryPIE is not performing complete OWL 2 RL
reasoning, counter to the claim in the title of the paper.

The exclusion of these rules also makes the performance results in the paper
very suspect. The performance of QueryPIE is also tested on only two
datasets, one artificial and one realistic. This is not an adequate number
of tests.

There also needs to be a realistic evaluation of the performance gains
obtained by pre-materialization. The evaluation in 6.2.2 is inadequate as
it is synthetic, only an estimate, and performed on a small dataset (whose
size is not reported). Why is it not possible to simply run QueryPIE
without pre-materialization? This would give a gross measure of the gains.
Of course, for this to be a reasonable measure of the gains that can be made
by pre-materialization, the inefficiencies in Algorithms 1 and 2 should be
first addressed.

First round reviews:

Solicited review by Aidan Hogan:

This paper discusses QueryPIE: a system that supports OWL 2 RL/RDF rules using hybrid reasoning methods, which combine some pre-runtime partial materialisation -- mainly on a schema level -- with runtime backward chaining. The core backward chaining algorithm is based on an old top-down evaluation technique called Query-Sub-Query (QSQ), where from the initial query, subqueries are expanded based on the program in question and answered against the database in question, and recursively against materialised inferences; subqueries and their answers are tracked in memory to avoid repetition/cycles, thus ensuring termination. QSQ is augmented with some partial pre-materialisation, which assigns extensional predicates to certain atoms in the rule bodies and pre-materialises their full extensions; thus they do not need to be unfolded at runtime, but can be answered with a direct lookup. Soundness and completeness results are provided for these Datalog-style algorithms. The paper then looks at applying these generic algorithms to OWL 2 RL/RDF reasoning, where schema triples are pre-materialised and schema atoms in rule bodies assigned extensional predicates such that they are not unfolded. Discussion is briefly provided on internal redundancy within the OWL 2 RL/RDF program, as well as on the redundancy caused by user-defined axiomatic triples (where RDFS/OWL terms are recursively defined using the RDFS/OWL language), where certain cases of redundancy are avoided using manual optimisations. Evaluation is then provided for 5 billion triples of Linked Life Data and 10 billion triples of LUBM data on one machine. Twelve queries are tested, each containing a single pattern selected manually from available example queries for LUBM and Linked Life Data. With hybrid reasoning enabled, results show response times from between 7--6641 ms warm cache and 163--9289 ms cold cache; one pattern involves inference of 2.4 million triples (takes ~5/6 seconds). Other results look at the reduction in proof sizes enabled through partial materialisation, the (lack of) expense of offline partial materialisation vs. offline full materialisation, comparison of runtimes for RDFSpD*OWL 2 RL/RDF rules, etc.

In general, the work is very promising in terms of practical impact. The main benefit of this work is the demonstration of scale and performance for backward chaining wrt. OWL 2 RL/RDF rules. Such backward chaining techniques have obvious benefits for scenarios involving large-scale, dynamic RDF data. The evaluation at 5--10 billion triples offers a good demonstration of the potential of QueryPIE in terms of scale and in terms of performance. These practical aspects suggest to me that the paper deserves to be published, and these results reported.

I do, however, have various concerns about the paper.

First and foremost of these, I am concerned that the paper only gives a shallow treatment of the challenges of applying OWL 2 RL/RDF rules, for which the paper claims to be complete. Though OWL 2 RL/RDF can, for the most part, be supported in a similar fashion similar to pD*, some parts cannot. As a quick list, the main differences moving from supporting pD* to OWL 2 RL/RDF are:

(1) support for fully generalised triples;
(2) support for inconsistency checks;
(3) support for arbitrary length rule bodies due to lists;
(4) support for datatype semantics;
(5) support for additional OWL 2 features.

From the paper:

(1) is not mentioned, but is trivial enough.

(2) is explicitly not supported, which is fine.

(3) is mentioned briefly, where rdf:first and rdf:rest triples are mentioned. However, these OWL 2 RL/RDF "list rules" cannot be written down as single Datalog-style rules over triples. They either require special support for lists or support for quaternary atoms, or similar. Similar discussion is provided for converting OWL 2 RL/RDF to RIF [a]. Rules like cls-uni, cls-oo, cls-int2, scm-int and scm-uni are easy to support since they apply immediate consequences for each element of a list. However, I'd be interested to hear how you support rules like cls-int1 (intersection) and prp-key (hasKey), which require "for all members of the list" logic; and prp-spo2 (property chains), which requires an ordered traversal of the list. In summary, support for list-rules is non-trivial, and I'd like to see discussion in the paper on how they are supported.

(4) I didn't see any mention of support for datatypes, which are perhaps the most painful part of implementing OWL 2 RL/RDF, and I would imagine would cause some serious difficulties in a top-down scenario. One way to support them would be to use canonicalisation techniques (enforce a bijection between literal syntax and literal value), but again, they are not mentioned. Are they supported? If so, how?

(5) Aside from the list stuff, most of the new OWL 2 features can be supported easily enough with standard Datalog-style techniques, and don't really need special treatment in the paper. However, these new OWL 2 features (esp. things like hasKey, propertyChainAxiom, etc.) do add significant practical expense to the reasoning process. However, your evaluation is over LUBM and Life-Science-Data. I know that LUBM does not have OWL 2 features: if I recall, aside from trivial syntactic inferences (x type rdfs:Resource/owl:Thing ; rdfs:Class/owl:Class, etc.), the only difference for LUBM between pD* and OWL 2 RL/RDF rules is the use of owl:intersectionOf (which is actually OWL 1, but which pD* does not support due to lists). Thus LUBM is not a great benchmark for testing out reasoning at the expressivity of OWL 2 (RL). I'm not familiar with Life-Science-Data, but playing around with their SPARQL endpoint [b], it also seems like they don't have any OWL 2 stuff in there, though they do have OWL 1 list features. So, in running OWL 2 reasoning over exclusively OWL 1 data, I question how much your results reflect the additional expressivity of OWL 2 RL. The minimal differences between pD* and OWL 2 RL results in Table 5 confirm this. On the other hand, I'm not aware of any good benchmark for OWL 2 that I could concretely recommend: perhaps you could have added some new OWL 2 axioms to the LUBM ontology, or such.

Also, one part of the paper that is specific to OWL 2 RL -- the discussion on eliminating redundancy within the ruleset to avoid duplicates -- seems to be incomplete. You take the examples of redundancy caused by the symmetry of prp-eqp1 / prp-eqp2, cax-eqc1 / cax-eqc2, prp-inv1 / prp-inv2. However, for example, *none* of the rules prp-eqp1 / prp-eqp2, cax-eqc1 / cax-eqc2 are needed if the rules scm-eqp1 or scm-eqc1 (which map equivalence to subsumption) are supported. Similarly, cls-int2 is not needed if scm-int is used, cls-uni is not needed if scm-uni is used. Going a bit deeper, scm-sco makes recursion of cax-sco unnecessary, etc. In general, I'm wondering on why the presented examples of redundancy are tackled and others are not.

My second (related) concern is that the paper does not necessarily reflect the difficulty of what is being done. For reasonable inputs and reasonable queries, (as the results show) the average case performance might be quite satisfactory, but in other cases, QueryPIE would hit performance issues well before corner cases were considered. This is, of course, unavoidable, and not a criticism of the approach. But I think that the conclusions need a bit more qualification. I don't see grounds for claims like: "In the average and beat cases, the performance is in the order of dozens of milliseconds which is in the same range of the performance of current high performance triples stores..." and "However, on average the response time is still in the order of dozens of milliseconds...". I grant that the best case claim is fine, but if I average the times in Table 3, I get 1,192 ms warm cache and 2,857 ms cold cache: one or two orders of magnitude off what I'd expect a good triple store to give me for lookups without reasoning. Perhaps you picked challenging query patterns to evaluate that don't reflect the typical case, but eitherways, I don't see any justification for saving that average performance is the same with or without reasoning. I don't think this is possible, nor do I believe "from the user perspective, the difference is much less noticeable", esp. when you start looking at full SPARQL. Again, I don't find fault with the performance of the system; I find fault with some parts of the paper that I perceive as over-selling results, particular when evaluation is only over OWL 1 data and when some parts of the OWL 2 RL/RDF standard (like datatypes) are seemingly not tackled, and when the maintenance of materialisations (including owl:sameAs) are not treated in detail.

My third concern is a bit subjective, in that I found Sections 3 and 4 very dry and difficult to understand. Some of the concepts and notation used are not introduced, but left to external reference. Having only a passing knowledge of Datalog and deductive databases, I definitely struggled in these sections, and felt that the paper could do more to improve readability. First, Algorithm 1 badly needs an example; concepts like unifiers and most general unifiers should also be introduced. In the scope of a journal paper, notation in general should "stand alone" and not be left to external reference. The wording of Proposition 2 makes it very difficult to understand. In general, I also wonder how the proofs in Section 3 would differ from the standard proofs for QSQ: you basically tackle the issues of range-restricted rules, chains of finite proofs, cycles, blocking, etc. Do the proofs here differ in a significant way from what has already been presented? If not, could they be shortened by use of external reference? I have similar concerns for Section 4.

Aside from these concerns, I feel that the core contribution is solid, and improves the state-of-the-art for scalable backward-chaining wrt. OWL 2 RL. In my opinion, the above concerns could be addressed as editorial changes:

(1) Make explicit what parts of the OWL 2 RL/RDF standard you tackle or don't tackle. Give more details on how you support all of the list rules. Give more details on how you support datatypes, or make clear that you don't. Either evaluate your methods over some OWL 2 benchmark, or make explicit that -- in the absence of any suitable OWL 2 datasets/benchmarks -- you run your methods over exclusively OWL 1 data. Qualify your conclusions accordingly: you do not support complete OWL 2 RL/RDF if you do not support inconsistency checks (or datatype semantics), and you do not test the full expressivity of OWL 2 RL.

(2) Relatedly, avoid overselling. Keep the conclusions grounded in the results you present. Either show that your average case performance is competitive with state-of-the-art triples stores without reasoning, or remove the claim. Either show that you reasoning performs well for OWL 2 ontologies, or remove the claim. And so forth.

(3) Add more examples for formal parts of the papers, esp. the algorithms. Add more intuitive discussion as to what the proofs show. Introduce all notation and concepts used, even if only briefly.

(4) The writing in parts could be improved. Also, double-check the following minor comments:

==========================
MINOR COMMENTS:

== Throughout:

* A nitpick: the profile (i.e., the language) is OWL (2) RL, a subset of OWL (2) DL. The ruleset is OWL (2) RL/RDF.

== Abstract:

* "ten billion of triples" -> "ten billion triples"

== Section 3:

* "Rules formulated in the SPARQL language like those in the OWL RL ruleset". OWL 2 RL/RDF rules are not formulated in the SPARQL language: they are formulated in Datalog style, and translated into first-order theories for correspondence proofs. Anyways, not sure what is being said here?

* Figure 1: the writing is too small. Should the top atom not read "T(x,type,Person)"?

* "guarantees the termination" -> "guarantees termination"

* "an avoiding" -> "and avoiding"

* "Instead of contructing the proof tree sequentially ... manner." How? I do not follow.

* "since that the" -> "since the"

* "are the abbreviation of" -> "are the abbreviations in"

* Algorithm 1: I'm confused about the "scope" of 'Tmp'. It's passed as an argument to 'infer(.)', but calls to 'infer(.)' never pass it. It's also defined as a global variable.

* Algorithm 1: Why is θ_P needed? An example is badly needed in the paper here, but as far as I see it, line 19 is not needed and actually breaks the algorithm. Letting a, b, c, d, x, y be variables and leaving the predicate implicit, take:
-- Q := (x, type, y)
-- r := (a, d, c) <- (b, SPO, d) (a, b, c)
Q is unifiable with r.HEAD. Now, for the MGU let's say:
-- θ_h = { a/x , d/type, c/y }
So now,
-- θ_h(Q) = (x, type, y)
Now take p := (b, SPO, d). This is not unifiable with θ_h(Q). So why is line 19 needed? On line 20, why is it not just
-- infer(θ_h(p) ...)
where
--- θ_h(p) = (b, SPO, type)
Perhaps I'm missing something, but this seems incorrect. Again, an example is badly needed for this algorithm in particular.

* It would help to clarify what is different to standard QSQ in Algorithm 1. Where does parallelism come into play?

* I found the wording for Proposition 2 very confusing. Please rephrase. I also miss some remarks on why the propositions provide a proof of the completeness of Algorithm 1.

* "antecedent" -> "body" (keep consistent)

== Section 4:

* The meaning of subscripts and superscripts on relations such as S should be clarified directly.

* Proof 3: Where is program P' defined?

* The notion of "almost the same variable assignment" needs an example, though I get the idea that S should contain enough information to emulate the intensional predicate it replaces wrt. P(J).

* Proof 4: a quick primer on notation for the immediate consequence operator, least infinite ordinal, etc., would be welcome prior to this proof.

== Section 5:

* "Detecting an avoiding" -> "Detecting and avoiding"

== Section 6:

* For Table 5, you should mention that these results are warm cache. How do you measure times at a resolution of 0.98 ms? Are you doing repeated runs? Micro-benchmarking in Java can be tricky.

* What kind of hard disks are you using? For query 2, I'm impressed that 205.5 MB are read in 1.2 seconds, though I guess this is within the bounds of SATA 3.0 or so.

* Are results decompressed/mapped back to full strings?

* I'm confused by what the purpose of "inserted"/"no insertion" is in Table 5. I don't follow the sentence "Since the number of results ...". In any case, there seems to be little difference between the two.

* "Since our approach deviates from standard practice in the field, we have formalized the computation using the theory of deductive databases and extensively analyzed and proved its correctness". I'm not sure that the formal aspects diverge from standard practices in the field. Again, OWL 2 RL/RDF is presented in Datalog style in the standard. I'm not sure what is being said here?

[a] http://www.w3.org/TR/rif-owl-rl/
[b] http://linkedlifedata.com/sparql

Solicited review by Matthias Knorr:

The paper presents a backward chaining algorithm for querying on very large RDF data sets focussing in particular on OWL RL rules. The work is well motivated, the intuitive ideas are well explained, and the convincing evaluation supports the claims. Also, the technical results seem overall valid to me, although not too surprising. However, several details can still significantly be improved, and there are quite a few issues that make the paper more difficult to read.

First, the paper would surely benefit from introducing/recalling some notions related to backward chaining and proof trees, because later a number of notions is used in the paper which has never been well-defined. This would make the paper much more self-contained and avoid some confusion in the following sections.

Moreover, in the beginning of Section 4.2, it is claimed that replacing body atoms with auxiliary predicates that contain the full materialization of the body atom w.r.t. a given database, yields the same full materialization of the database as under the original program. This seems like a perfectly valid claim to me, however, Prop. 4 does not show that but rather an intermediate result towards that. And even that is not entirely convincing yet, since the correlation between Ri(ti) and S(t) for some arbitrary t is not completely well explained (for example, p.10, the conclusion that S(beta'(t))in T^0_P(I) only works due to the assumption that q0^P(I)=q1^P(I) which is not pointed out).

Additionally, there are a number of minor issues listed below (indicated with page number, l/r for the left and right column):

- 1, r: please provide references for "Several approaches that perform…in the literature." and "State of the art methods";
- 2, l: the following (part of a) phrase is not clear: "to calculate the inference which exploits such partial materialization"; also unclear: "to exploit the advantages of the latest standard OWL language on a large scale";
- 2, r: "In Section 2 we introduce … to the problem of large scale reasoning…" - as far as I can see, that is not what is presented in Section 2, please adjust;
- 3, l.: in Example 1, the abbreviations EQC and SCO are announced, however the former is never used and the latter only on page 11;
- 4, l.: "generic input D" - should be a database but is never introduced as such; the entire second half of Example 2 is a bit hard to trace;
- 5, l.: why not define subseteq for atoms directly on identical predicates instead of using R and R' and then saying that R=R'?
- Algorithm 1 defines a ternary function infer, but uses a binary one in all calls, while the proofs 1 and 2 use the ternary, please unify.
- 6, l.: proof 1, "Let Q(b1,…,bm) be the fact" - better use atom instead of fact?
- 6, r.: "Since Q(b1,…,bm) was an answer to Q" - answer is probably not the right term?
- 7, l.: the very end of proof 2 is confusing insofar as main returns New, so if R(a1,..,an) was returned by Qn in the n-th while loop pass, why talk about the n+1 while loop passes for look-up considering the initial intended claim in proof 2?
- 7, l.: "whether a rule can contribute to derive facts for the parent branch." - the term "parent branch" is not entirely clear;
- 8, l.: in proof 3, P' is used but never defined;
- 8, r.: "(in our specific case…OWL RL rule set)" - since OWL RL does not appear in this section, it is probably better to drop the content in parentheses;
- 9, l.: "where one body atom is replaced" - please point directly to Ri(ti); "S is merely required to contain the necessary information." - this necessary information needs to be further clarified;
- 9, r.: In Example 7, if I got the premises correctly, then the two queries do not deliver the same answer tuples: q0 only contains (a,c) while q1 contains all elements of R1^{P(I)}; Why not briefly recall Tp for the beginning of Proof 4?
- 10, l.: the definition of l0 as the max of lj does not work properly if there is no other atom in the body than the one which got substituted;
- 10, r.: "and the pre-materialization procedure which ensures the correctness of our approach." - that is not really correct because the approach would also be correct without the pre-materialization (but not as fast); reference 11 is surely a bit too general - why not point to the actual document containing these 78 rules?
- 12, l.: why not at least hint to that special algorithm?
- 14, l.: please add "in general" to the observation that cold runtime is significantly slower, since this is not true, e.g., for query 6, 7 vs. 6,6 seconds.

Finally, the language can also be improved significantly, please consider the following list of issues ordered by appearance in the paper:

- 1, abstract: article "a" in front of words starting with a vowel becomes "an" (e.g. an experimental prototype, but also later in the paper);
- check commas (e.g. page 1, To this end,… which is … underpinning logic, can …" - the comma before "can" only fits if there is one before "which" as well); also in general: If Condition, (then) Consequence - with comma, while Consequence if Condition without; moreover, commonly, the beginning of phrases such as "In Section 3" is followed by a comma;
- please avoid dangling pronouns/demonstrative determiners for which it is not entirely clear to what they are referring, e.g., page 2 "these techniques" - which ones?; page 2 " Also, it provides a theoretical analysis…" - you mean the paper, but the way it is written it refers to the community;
- 2, l.: "keeps the response often under the second" - better: below/under a/one second; "More specifically, it extends the initial version … to support the OWL rules, … by the community." - it should be "to one that supports the OWL rules"; "the community" could probably be a bit more specific.
- 2, r. (and further on): pre-computation, i.e., with hyphen, in fact, several words are used in different ways (with or without hyphen and in the latter case joint or separate throughout the paper); see "rule set", "in between", "proof tree", "loop pass", "use case";
- 3, l.: "in both instances" better in both cases; phrase before Example 1 misses the final "."; "that was first introduced in 1986…by applying to set…" - a set - "a" missing; "by using the modern architectures" - without "the"; rewrites the initial query in…" better "rewrites … into";
- 4, l.: "derivations steps" - derivation steps - and "example of such tree" - better "such a tree"; "already evaluated an avoiding…" - "and"; "more times" - better "several times"; "fix closure?";
- 4, r.: "in the proof tree is correctly…" - "are" (for all bindings); "modern multi-core architecture" misses the plural "s"; "since that"; "Therefore, we replace it…and storing" - better use the infinitive store; "the entire proof tree must be performed another time" - please rephrase;
- 5, l.: After this loop…, the algorithm returns…and returns the results." - please remove one of the returns
- page 6, l.: "for similar reasons as before there are,…" - please remove "there are"; databese - typo, and actually, the verb is missing in this phrase; "holds in particular true" - please remove true; "for all answers to Q" - Q has not been introduced properly;
- page 6, r.: "lenght"; also Prop. 2 is not correctly formulated ("," or "." after Database missing); "in at most n many" - better without "many" (same top of p.7,l.)
- page 7, l.: "in memory allows to" - allows us to
- page 7, r.: "triples patterns" - better "triple patterns; also "all the derivation" - maybe "the entire derivation"?;
- page 9, l.: "our program P" - in the general definitions it is probably better to drop "our";
- page 10, r.: "precomputing beforehand" - is o.k. but doubled, because precomputing afterwards is counterintuitive; also "which is a common practice among reasoners" - maybe remove "a" and what do you mean by among reasoners?; "in detail at [16]" - consider substituting "at" with "in";
- page 11, l.: "Detecting an" - and;
- page 11, r.: EQP is used in two different styles;
- page 12, r.: "a the";
- page 14, r.: "purposes like" - please use "such as", "for instance", or "e.g." instead of "like" for pointing out examples;
- page 17: in several references owl (2 el/ql/rl) should be capitalized;
- page 18: "A.List of tables" is probably not the best heading, please change it.

Solicited review by Raghava Mutharaju:

The authors present an approach where they use backward-chaining along with some amout of pre-materialization to execute single pattern queries on RDF triples with OWL 2 RL rules.This hybrid approach has already been presented by the authors at ISWC 2011. But in comparison to their previous work, in the current work, they choose a different profile, namely OWL 2 RL, which has a much larger rule set and present a theoretical analysis of the soundness, completeness of their approach. Another interesting aspect of this paper is that the query evaluation of large number of triples (up to 10 billion) has been done on a single machine.

This is an interesting approach but there are several spelling/grammatical mistakes in the paper. Paper can be accepted after these minor corrections.

1) Page 2, instead of "ante", it should be "at the"
2) Page 2, section 2, use of "ex-ante" should be fixed
3) Page 3, section 3, instead of "query in many subqueries" it should be "query into many subqueries"
4) Page 4, instead of "derivations steps", it should be "derivation steps"
5) Page 4, instead of "evaluated an avoiding", it should be "evaluated and avoiding"
6) Page 4, section 3.1, remove "that" from "that the"
7) Page 4, section 3.1 last line should be "rule applications"
8) Page 5, use of R(t) could be explained more.
9) In section 3.1.3, after proposition-2, it could be mentioned that the last part of the proof (proposition-3) will be explained in section 4. I thought it was incomplete until I came across the last part in section-4.
10) Can't Section 3.1.4 be merged with section 4?
11) Section 3.1.4, instead of "queries are maintained available in memory", use either "maintained" or "available".
12) Section 3.1.4, it is not entirely clear why T(x, SPO, TYPE) and T(x, TYPE, SYM) patterns would be pre-calculated (it was mentioend as a matter of fact).
13) Page 7, Section 4.1 last line, R(t) in what? Something is missing after in
14) Page 9, where is Example 5 used? It doesn't seem to be referenced/explained anywhere.
15) Page 11, example 8, instead of x in S_{sco}(x, SCO, z), shouldn't be y?
16) Page 11, instead of "Detecting an avoiding", it should be "Detecting and avoiding"
17) Page 11, in the rule name "as cax-eqc1", there is a redundant "as"
18) Page 12, section 6.1, instead of "intended", "intend" can be used. In the same line, "closure against a the" - remove "a"
19) Page 15, use of the words "temporary storing" should be reworded.
20) Page 17, "such" is missing after "related techniques"
21) Reference 17, pages are missing. Either remove it or include the proper pages (pages range).

Tags: 

Comments

I read with interest the submission on QueryPIE. Unfortunately, I found
some severe problems at the start of the paper, which prevented me from
understanding the approach taken in the paper.

There is no reference to any literature on Query-subquery, nor is there any
reference to any literature on backward chaining. These form a severe
problem in and of themselves.

Problems with Algorithm 1:

There is no pointer to any literature on the genesis of Algorithm 1, so the
only source of information on it is this paper.

Algorithm 1 has a number of local problems.

Line 10
- infer takes three arguments but is only given two
Line 16
- MGU is called without ensuring that the arguments are unifiable
Line 17
- probable typo - epsilon instead of h
Line 22
- join is not an operation that is normally done on substitutions

Algorithm 1 does not work.

Consider looking for x type y
using the rule x type y <- z SCO y & x type z
with data I type A
A SCO B

The first call to infer is done with x type y and {} (and {}?).
all_subst becomes { x->I, y->A }.
The sole rule is acceptable and thetaH is { x->x, y->y }.
subst becomes { { x->x, y->y } }.
The first body element is processed.
There is no MGU of x SCO y and x type y.
The algorithm cannot proceed.

Perhaps there is meant to be some application of thetaH to the body
elements, but there is no discussion of this part of the algorithm in the
paper, nor is any other published source of information on the algorithm
provided.

There appear to be other problems with Algorithm 1, but they cannot be
verified because of this early failure.

As this algorithm is a foundation of QueryPIE, there is no point in
reviewing the rest of the paper. For the paper to be acceptable, Algorithm
1 first needs to be fixed, and then the paper re-reviewed.