Review Comment:
The paper presents an approach to automatically enrich multidimensional RDF data compliant with the QB4OLAP vocabulary with triples that explicitly indicate the spatial relationship between members of the cube. These relationships are derived from the geometries that describe such members; the algorithms to extract spatial relationships are formalized and evaluated in terms of both effectiveness and efficiency.
The paper discusses an interesting research topic at the crossroads between the areas of semantic web and (S)OLAP analysis. The contribution of the paper is not groundbreaking, but it presents and evaluates a framework for spatial enrichment of RDF data which is worth considering for publication. The quality of the paper is good, both in terms of presentation and self-containment; related work is accurately discussed. Nonetheless, I found some issues that require revision by the authors.
First of all, Section 4.1 and Algorithms 3-4 seem to describe a relatively simple process in a quite complicated way. If my understanding is correct, algorithm 3 simply retrieves the couples of linked level members where both are described by geometries and verifies the spatial relationship between the latter; Algorithm 4 does the same, with the only difference that level members are not directly linked, but they belong to different levels within the same hierarchy. Therefore, 1) in both cases, a verbal description that explains the intuition behind the algorithms is missing, 2) wouldn't algorithms 3 and 4 be better represented by using a simple and concise SPARQL query rather than a notation-heavy process?
Second, I question the relevance of Section 5. Except for section 5.5 (which makes interesting observations about the state of the art of spatial technologies for semantic web), this section deeply describes the code of the implementation. What is the scientific relevance of this part? Considering that the implementation carefully follows the algorithms presented in the previous section and that the code is available on Github, I don't see the point of making such a discussion. I would advise to 1) significantly reduce the discussion and maintain only the aspects (if any) that are interesting from a scientific/research perspective; 2) possibly move to an appendix the detailed discussion if the authors believe it should be absolutely kept in the paper. Otherwise, please provide a solid motivation for discussing the implementation code in a core section of the paper.
Finally, I have some doubts about the soundness of the evaluation in Section 6.2.
In the comparison of Algorithms 3 and 5 (both of which are based on explicit relationships), I would have expected to see an execution time proportional to the number of relationships; the results clearly prove this expectation wrong. I suspect this is due to the different nature of the relationships considered in the two algorithms. Unless I missed this, the authors should give more details to explain these results.
But my main concern is about Table 6. It seems unfair to compare exact results on run time with (what appear to be) rough estimates on the development cost. The authors need to provide further details about these development costs and about how they have been obtained. Also, it is not clear whether the user's expertise has been taken into account. Please explain this part with more details, highlighting the critical phases (conversion? loading?) and discerning the objective data (i.e., actual times) for subjective aspects (i.e., user expertise).
Other remarks are indicated in the following:
- Fig.1: not clear what the semantics of arrows is.
- Check the use of acronyms across the paper. For instance, MD is introduced in p.1 line 40 left and then re-introduced in p.2 line 22 left; similarly, SOLAP is introduced in p.1 line 51 right and p2. line 24 right, but not used in p.2 line 5 right. This issue is all over sections 1, 2, and 7.
- p.3 Contributions: at this point, it is not clear what the meaning of "explicit" and "implicit" hierarchy steps (and fact-level relations) is. Either explain this before or change the contributions to make them more general.
- Fig.7: still not clear what the semantics of arrows is. The figure is introduced as representing a "process flow", but it looks more like an architectural view of the framework. Also, it appears from the figure that queries on the triplestore are never formulated by the user, not even through GeoSemOLAP. Is this correct? What is the meaning of arrows outgoing from the "queries" module? If they are the response from the incoming arrows, wouldn't it be better to have single double-ended arrows? What is the legend of the symbols? Please revise this figure.
- Maybe I missed it, but what are "p" and "k" in p.10 lines 17,22 right?
- p.11 line 45 left: "The output of the helper function (Vs(ac)) keeps the spatial attribute values of the child level member idI(lmc)". Isn't this an abuse of notation, since vs(a) has been defined in p.10 line 40 right as a set of literals?
- Section 6: please convert execution times from seconds to minutes where necessary for easier reading; Table 6 has high run times in seconds (e.g., 2622) and dev.costs in minutes (e.g., 5 minutes = 300 seconds)? Please present the results consistently.
Minor comments & typos:
- p.1 line 39 left: which allow
- p.2 line 12 left: for instance?
- p.3 line 4 left: should begin with "An illustration of..."
|