Review Comment:
The authors of this manuscript claim to present "a distributed SPARQL query engine that adopts novel techniques to [...] improve [Linked Data] query efficiency."
Frankly, I stopped reading this manuscript after the second paragraph in Section 3.1, at which point it became apparent that a thorough assessment of the proposed approach is impossible, because the authors fail to accurately introduce (let alone define) even the very basic concepts. This does not only hold for concepts that readers who know SPARQL may be familiar with, but it also holds for specific concepts related to the proposed approach. Examples for the latter are given as follows:
1) The authors neither explain what a "concrete node" is, nor what a "binding" is. Consequently, it is also not clear what the "number of bindings, or cardinality of a concrete node" is. Furthermore, there is no explanation of why edges/triple patterns are conceived of as active entities that might change such a cardinality.
2) While "shared variables" seem to have a cardinality too (as one can infer from the last sentence in the first paragraph of Section 3.1), the authors do not provide a description (let alone a definition) of what the cardinality of a shared variable is.
3) The authors claim to "introduce the notion of fixed cardinality node" but then go straight to a description of a property of such a node (without introducing/defining the concept before). If this description is meant to be the definition, this is not very helpful because the given description assumes an understanding of what "the execution of all connected edges" is, which has not been discussed before (moreover, the unspecified notion of "bindings" appears again in this description). Hence, after reading the description of this property (of fixed cardinality nodes), I do not know what a "fixed cardinality node" is (and I doubt the majority of the readers will).
4) The operation of "removing all fixed-cardinality nodes" (based on which the authors try to make a case for parallel processing opportunities that seem to be the key idea of the authors' approach) is unclear. Neither the "more precise description" in footnote 6 nor the given example provide a sufficient definition. More specifically, it is not clear what the "more precise description" in footnote 6 refers to and what it means, and the example is formally incorrect for the following reason: In graph theory, any sub-graph of some graph is a graph itself, that is, a pair consisting of a set of vertices and a set of edges; I do not see such pairs in the example (instead, the authors' represent any sub-graph as a set). Moreover, I do not see why removing vertices B and C from the graph in Fig.1 results in three subgraphs; the result that I would expect (based on the graph theory that I have been taught in university) is a _single_ (sub-)graph that consists of two vertices (A and D) and no edges.
The demonstrated lack of a precise introduction of the basic concepts makes an in-depth understanding of the rest of the manuscript impossible. Therefore, the manuscript is unacceptable for publication in the journal.
Further aspects in which the part of the manuscript that I read lacks clarity are the following:
1) Given that the presented work focuses on SPARQL query distribution, I do not see what this work has to do with "observing the Web of Data" (as mentioned in the title).
2) The Abstract does not mention what "the distribution challenge" is that the authors aim to achieve an understanding of.
3) The first sentence in Section 1 is not clear about what era it refers to.
4) The authors must elaborate on why "machine-understandable, interoperable data and coordinated datasets [...] are contradictory to the properties of Big Data." (Section 1)
5) I would expect a reference for the claim that "many LD can be accessed via SPARQL endpoints." (Section 1)
6) The authors must elaborate on why "the latency of data transfer [...] becomes more challenging in the case of distributed SPARQL queries" (as opposed to other distributed database settings).
7) The related works section (Section 2) does not clearly identify the main conceptual difference (data shipping vs. (sub)query shipping) between the two categories of approaches mentioned.
8) In the last line on page 2, it is not clear what "they" refers to.
9) The authors must elaborate on why "statistics that are accurate enough [...] are unlikely to be available on a large scale." (Section 2)
10) The related works section should discuss how the proposed approach is related to the mentioned approaches.
|
Comments
Submitted in response to http
Submitted in response to http://www.semantic-web-journal.net/blog/call-papers-special-issue-seman...