GeoSPARQL-Jena: Implementation and Benchmarking of a GeoSPARQL Graphstore

Tracking #: 2108-3321

Authors: 
Greg Albiston
Taha Osman
Haozhe Chen

Responsible editor: 
Ruben Verborgh

Submission type: 
Tool/System Report
Abstract: 
This work presents a RDF graphstore implementation for all six modules of the GeoSPARQL standard using the Apache Jena Semantic Web library. Previous implementations have provided only partial coverage of the GeoSPARQL standard. There is discussion of the design and development of on-demand indexes to improve query performance without incurring lengthy data preparation delays. A supporting benchmarking framework is also discussed for the evaluation of any SPARQL compliant queries with interfaces provided for integrating additional test systems. This benchmarking framework is utilised to examine the performance of the implementation against two existing GeoSPARQL systems using the Geographica benchmark. It is found that the implementation achieves comparable or faster query responses than the alternative systems while also providing much faster dataset loading and initialisation durations.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 03/Feb/2019
Suggestion:
Major Revision
Review Comment:

The article presents an extension of Apache Jena for GeoSPARQL called GeoSPARQL-Jena, which claims to be fully complaint. The authors furthermore adopt and refine the Geographica benchmark to compare the performance of GeoSPARQL-Jena against Parliament and Strabon.

It is indeed troublesome to find a fully compliant solution for GeoSPARQL, and the authors aim to address that. In that sense, the tool is important and, provided compelling evidence is provided, will have an impact on the community.

The weakness of the paper is the evidence; while the authors claim to be fully complaint, their method of evaluating the tool is based on Geographica. Geographica mostly tests the availability of certain functions, and not the correctness thereof. The authors are right to recognize that (amongst others) the support of GML is limited, but their experimental setup does not demonstrate these gaps. The authors furthermore did not include Oracle Spatial and Graph into account (which is being used by organizations, an evaluation would thus be welcome), who provide an image allowing them to compare results. While a virtual image, the authors could run the setup the virtual instance to compare the results with an important commercial player.

In short, the paper can be improved by providing compelling evidence that GeoSPARQL-Jena is fully compliant (which I will argue later on that it is not, see below), and by taking into account the state of the art which includes Oracle’s solution.

Additional, more detailed comments:

P1 (41-R): GeoSPARQL does not enhance but extends SPARQL. The standard furthermore provides a vocabulary.

P2 (20-L): I am not convinced whether the results of string literals, after being processed, are discarded. What about the tools that use spatial data structures such as R-trees or R*-trees?

P2 Please define what you mean by “dimensionality”. The number of results on line 43 is also not correct as some relations are disjoint: if x and y are geof:sfWithin, then they are not geof:sfDisjoint.

P2 The 3rd contribution was hinted at in a discussion, but not addressed in this paper. I believe this should thus not be considered a contribution.

P3 The authors state that, as far as they know, there has not been a systematic research or compliance testing of various frameworks w.r.t. GeoSPARQL. Could the authors elaborate or clarify this statement? This was the goal of Geographica. It is true that the number of tools assessed by Geographica is limited, but this holds for this paper as well. The major drawback of Geographica is that it mainly tests the availability of functionality, and not necessarily the correctness thereof. The authors did identify cases where incorrect results would be returned, which is related to that particular problem.

P3 (34-L) Suggest replacing “obsolete” by “deprecated” or “dated”.
P3 Reference to uSeekM is missing

The authors could provide a brief related-work section on benchmarks (LUBM, BSBM, DBpedia SPARQL Benchmark, …) [5] proposes PoDiGG, which generated a dataset with an explicit geospatial component. The authors could then also identify or scope the goal of (their) benchmark; testing functionality and performance, but not correctness per se.

P4 (26 -> 29) rephrase.

P5 (10-L) Reference 14 in the paper is not an accepted publication and a link to the manuscript is not available. I do not understand the added value of referring to the application domain in Section 2.4 whilst referring to 14. I suggest this section to be omitted or for more compelling use cases / applications of GeoSPARQL to be mentioned. Authors could look into the results of the http://geoknow.eu/Welcome.html (or [12]) or Linked Data initiatives in public administrations across the world.

P5 (10-R) potential errorS (typo)

P5 (Section 3.1) Features may have multiple geometries. But the authors could mention that this could lead to problems if those relations are topologically inconsistent (e.g., by merging different resolutions in one graph). For instance, geom1 and geom2 at a certain resolution are disjoint, then their respective counterparts, geom1’ and geom2’ at a different resolution (e.g., simpler) should be disjoint as well. However, the “disjointness” of geom1 and geom2’ is not guaranteed. [6] presents a use case where the use for different geometries, stored in different graphs, is motivated.

The authors furthermore state that “Calculations are performed in the SRS of the first Geometry Literal.” I believe that it depends on the first literal an engine encounters, but which one is first encountered cannot be guaranteed the same across platforms (unless sorting instructions been specified in the query).

Section 4. “GeoSPARQL Jena” -> Inconsistent use of hyphen.
Section 4 is the most difficult section to parse; it contains a description of GeoSPARQL-Jena, Geographica and the refined version of Geographica (the benchmarking framework). I believe that restructuring that section (into multiple sections) could improve the paper substantially. While Section 4.1 is dedicated to GeoSPARQL Jena. Section 5 and Section 6.1 is mostly related to the implementation. The conceptualization of the benchmark and its application in Section 6 is to compare the system to Parliament and Strabon.

P7 (10-L) “… applying to any dataset.” -> applying what to any dataset?

P8 (5-L) What are the extra concepts that have been incorporated?
P8 (42-L) Burdensome in what sense?
P8 (47-L) There is a difference between relying on Internet activity for obtaining functions to execute a process and the availability being a requirement for a successful Semantic Web. How would your framework go about retrieving CRS on demand? The availability of (most) common CRS, units and functions may be assumed a given in this context.

Converting existing geospatial data into RDF is considered future work but has been covered by the GeoKnow project with TripleGeo [7]. What exactly is missing from the state of the art? I’m also surprised that the authors have not adopted existing CSV2RDF [9] approaches such as RML [8] and TARQL [9]. TARQL is also based on Apache ARQ and functions for converting CRS can be registered in a namespace to be used in the queries. Upon reflection, the loading of data is a separate concern and, I believe, not part of the GeoSPARQL-engine. In other words: is this within the scope?

Section 4.2 should be rewritten: describe Geographica, address/mention its problems and/or challenges, and then describe the design of your workbench. It would make the motivation for “rewriting” and “refining” Geographica clear. I appreciate that the authors have identified a case where a query can return the wrong outcomes, providing a motivation for rewriting all the queries in the benchmark. However, the authors should be careful with the goal of Geographica and by extension their own framework; Geographica tests the availability of a function and times its performance, it does not check the correctness.

P10 (1-L) Minor suggestion: rephrase “external SPARQL query files”. E.g. along the lines of “Unlike Geographica, SPARQL queries are stored in files which are loaded…” Again, the support for adding additional data/files/… (mentioned as future work) seems out of scope.

P10 (16) What is a “consistent query”?

P10 (24-L) “other SPARQL standards” -> do the authors mean “other standardized SPARQL extensions”? The authors have rewritten and refined Geographica. Because of this, they do not assess (and by extension not compare with different systems) all aspects that makes GeoSPARQL-Jena fully compliant. None of the datasets in the benchmarks use GML (for instance).

Section 5. I suggest renaming it to “Implementation”. The introduction also mentions “GeoSPARQLJena” (missing hyphen). Place the GitHub references in this section rather than in the conclusions. As a reader, I was looking for the source code.

Section 5. I agree that the GeoSPARQL specification does not explicitly exclude the use of 3 or 4 dimensions, but it can be deduced from the spec. I, myself, have a problem with the silent disregard of the 3rd and 4th coordinate; two planes may not intersect in 3 dimensions, but will when “reduced” to 2 dimensions. A user may be confused by the “nonsensical” output if that user is aware of that dataset’s contents. Refuting geometries using more than 2 coordinates is, while conservative, safe. A hope a user will see disclaimers or warnings when ingesting data. If not, a user should see those.

P11 (7->9) Rephrase. Also, this sentence seems to indicate that GeoSPARQL-Jena is not yet fully compliant w.r.t. that aspect.

Section 5.2 (or earlier) could look into prior efforts in converting GML such as [11] and the GeoKnow project.

Section 5.3. Reference to Java Measure API is missing.

Section 5.4. is interesting. Indeed, in the spec two points with the same coordinates are not equals as the dim(boundary(p1) intersects boundary(p2)) = dim ({}) = -1. Most systems do indeed relax the requirement, but one can argue that systems then become less GeoSPARQL compliant.

Section 5.4. Intersection is equivalent to within + contains, not the combination of (though it can be implemented as such).

Section 5.5. I cannot find the contradiction the authors are referring to. Requirement 5 is not related to the getSRID function. Requirement 20 and its documentation just defines geof:getSRID to be a function going from literals to URIs identifying SRID. GeoSPARQL adopts Simple Features might use integers to identify coordinate systems, but in GeoSPARQL they are identified with URIs. I’m not sure where the problem would arise as getSRID is defined in the geof namespace. Could the authors please elaborate and exemplify?

Section 6. The introduction gives the impression GeoSPARQL-Jena will be compared against other systems, but 6.1 is about GeoSPARQL-Jena. As mentioned, the paper would benefit from a comparison with Oracle Spatial and Graph.

Section 6.1 Hyphen missing from GeoSPARQL-Jena.
Section 6.1 How was the dataset generated, and can it be made available?

P12 (27-R) Use of “likely” would require further investigation, maybe as future work.
P12 (38-R) What do you mean by “empty points”?
P13 (36-L) What was the limitation to RAM?

Section 6.2 Misses details of type of RAM and mention that the HD was an SSD disk. What are the JVM settings.

P13 (35-R) “RDFS inferencing rules were enabled” Please state for readers whether these are stored.

P14 (17-L) Why 5 iterations each? Were there outliers? If possible, try running those at least 10 times and check whether there are outliers. The limited number of runs may be an impact on the conclusions you can draw from the means.

Parliament and Strabon provide endpoints, and GeoSPARQL-Jena uses the dataset. Would that provide an explanation of GeoSPARQL-Jena’s speed? Even though all endpoints run locally, there is still an additional HTTP overhead. Some other details are missing from the experimental setup; what were the services that ran; was the desktop disconnected from the network; … In other words: how controlled was the experiment?

P14 (12-R) “lengthy” is subjective; 80 seconds is not that big of a deal – it depends on the setting. I suggest also omitting the claim that Strabon is not suitable for environments where an application is frequently restarted; it all depends on the context. Strabon also offers much more than merely a triplestore and endpoint; it also supports displaying results on a map. This claim does not add value to the paper as providing evidence for this would require a different type of experimental setup.

Section 6.4. Tables are referred to, but not explained. What can we see from the tables? Would it be possible to provide tables with the figures? Some of the durations are so small that the reader cannot see the variances.

P16 (28) “Further investigation suggests…” I would investigate this further. I ran the following query on http://test.strabon.di.uoa.gr/NOA/Query

SELECT * WHERE {
BIND ("POLYGON((-77.089005 38.913574,-77.029953 38.913574,-77.029953 38.886321,-77.089005 38.886321,-77.089005 38.913574))"^^geo:wktLiteral as ?x)
BIND ("POLYGON((-77.089005 38.913574,-77.029953 38.913574,-77.029953 38.886321,-77.089005 38.886321,-77.089005 38.913574))"^^geo:wktLiteral as ?y)
FILTER (geof:sfEquals(?x,?y))
}

And it did not return a solution. The following query (notice the polygons), does return a result:

SELECT * WHERE {
BIND ("POLYGON((-77.089005 38.913574,-77.029953 38.913574,-77.029953 38.886321,-77.089005 38.886321,-77.089005 38.913574))"^^geo:wktLiteral as ?x)
BIND ("POLYGON((-77.029953 38.913574,-77.029953 38.886321,-77.089005 38.886321,-77.089005 38.913574,-77.029953 38.913574))"^^geo:wktLiteral as ?y)
FILTER (strdf:equals(?x,?y))
}

There might be a bug with the implementation of geof:sfEquals, but Strabon does seem to support spatial equality with strdf:equals.

P16 (48-L) Rephrase sentence and missing hyphen.
P16 (14-R) I suggest using “suggests” instead of “indicates”, unless you can provide evidence.

P18 (17-L) I suggest indicating the queries that were not resolved in a separate column rather than placing them in the ranking. While I agree that Parliament has the worst performance compared to the others, I would provide the data for the 11 queries in a table. Parliament did do well on 1 query and was in the second place for 3 queries. Even though you explain why one query was omitted from the chart, the table could provide all information. It would also be easier to compare whether the 3 queries for 2nd place warm are the same 3 queries for 2nd place cold. This table could provide a starting point to investigate when and why certain systems perform well (or better).

Conclusions.
The authors claim they have developed a fully compliant tool, yet that compliance was not demonstrated. As stated above, there were some sentences or decisions that question the compliance: some aspects of GML being deemed future work and the semantics of equals.

P18 (21-R) “Java Measurement Harness” is that the same as the Java Measurement API mentioned earlier on?

P18 (39-R) This observation has also been made below and is related to query optimization. The geospatial literals may consume a lot of memory, shuffling triple/graph patterns and filter functions that reduce the search space prior to fetching (or processing in case of the query rewriting extension) can reduce unnecessary computational overhead.

The authors mentioned challenges that have been mentioned elsewhere. [1] for instance proposed to reconsider the representation of geometries using RDF literals and encode those as resources using URIs. The argument was that people engaging with geospatial data are interested in the topological relations instead of the “raw” geometries. Those “raw” geometries can still be kept as people may request those in a query [2]. This is somewhat related to the problem highlighted by the authors; solutions “treat” geometries as literals. As a consequence, they may fail to recognize relations such as equivalence. It can also be incorporated in Section 2.3.

The authors also mention that tools often parse and process the literals for geospatial calculations. Even though in a different setting, [3] proposed an extension of Triple Pattern Fragments [4] for GeoSPARQL. The prototype developed in [3] computes those functions using JavaScript. In short, TPF [4] is mostly a specification for a TPF server and it is up to the client to support SPARQL construct. To the best of my knowledge, some filter functions and SPARQL 1.1 constructs are yet to be supported, but it could be interesting to extend the benchmarking framework to assess the compliance of different approaches to geospatial Linked Data. [3] also discussed “sensible” queries, which is related to the rewriting queries to reduce the cost of queries. The TPF architecture does have support for implementing strategies by having servers communicate metadata about a triple pattern fragment to the client.

[1] Blake Regalia, Krzysztof Janowicz, Grant McKenzie: Revisiting the Representation of and Need for Raw Geometries on the Linked Data Web. LDOW@WWW 2017
[2] Blake Regalia, Krzysztof Janowicz, Gengchen Mai, Dalia Varanka, E. Lynn Usery: GNIS-LD: Serving and Visualizing the Geographic Names Information System Gazetteer as Linked Data. ESWC 2018: 528-540
[3] Christophe Debruyne, Eamonn Clinton, Declan O'Sullivan: Client-side Processing of GeoSPARQL Functions with Triple Pattern Fragments. LDOW@WWW 2017
[4] Ruben Verborgh, Miel Vander Sande, Olaf Hartig, Joachim Van Herwegen, Laurens De Vocht, Ben De Meester, Gerald Haesendonck, Pieter Colpaert: Triple Pattern Fragments: A low-cost knowledge graph interface for the Web. J. Web Semant. 37-38: 184-206 (2016)
[5] Ruben Taelman, Pieter Colpaert, Erik Mannens, Ruben Verborgh: Generating Public Transport Data based on Population Distributions for RDF Benchmarking. Semantic Web (2018 – accepted for publication)
[6] Christophe Debruyne, Alan Meehan, Eamonn Clinton, Lorraine McNerney, Atul Nautiyal, Peter Lavin, Declan O'Sullivan: Ireland’s Authoritative Geospatial Linked Data. International Semantic Web Conference (2) 2017: 66-74
[7] Kostas Patroumpas, Michalis Alexakis, Giorgos Giannopoulos, Spiros Athanasiou: TripleGeo: an ETL Tool for Transforming Geospatial Data into RDF Triples. EDBT/ICDT Workshops 2014: 275-278
[8] Anastasia Dimou, Miel Vander Sande, Pieter Colpaert, Ruben Verborgh, Erik Mannens, Rik Van de Walle: RML: A Generic Language for Integrated RDF Mappings of Heterogeneous Data. LDOW 2014
[9] Generating RDF from Tabular Data on the Web. W3C Recommendation 17 December 2015. https://www.w3.org/TR/csv2rdf/
[10] TARQL: SPARQL for Tables: https://github.com/tarql/tarql
[11] Linda van den Brink, Paul Janssen, Wilko Quak, Jantien E. Stoter: Linking spatial data: automated conversion of geo-information models and GML data to RDF. IJSDIR 9: 59-85 (2014)
[12] Kostas Patroumpas, Nikos Georgomanolis, Thodoris Stratiotis, Michalis Alexakis, Spiros Athanasiou: Exposing INSPIRE on the Semantic Web. J. Web Semant. 35: 53-62 (2015)

Review #2
By Stasinos Konstantopoulos submitted on 23/Feb/2019
Suggestion:
Minor Revision
Review Comment:

The sumbission presents a query processor for GeoSPARQL queries, the
geospatial extension of SPARQL. The query processor uses Apache Jena
as the underlying SPARQL query processing infrastructure.

The submission is well-written and an interesting read, including a
good review of the GIS landscape from the Semantic Web perspective and
the challenges one faces to implement a GeoSPARQL query processor.

I feel that Section 5 "Challenges" is out of place and should come
before Section 4 "Implementation". Furthermore, it is recommended that
the "Implementation" sections refers back to the "Challenges" section
to explain whether and how each of the challenges has been overcome.

Section 1:

I do not understand the argument on p.2, left column, lines 15 ff.
The process for GeoSPARQL literals is essentially the same as with
all literal values: the literal is first mapped from the lexical space
to the value space and then the function required in the query is
applied. Admittedly GIS operations can be heavier than, say, an
integer comparison but you should acknowledge that this is not a
general property of SPARQL as you assert. Even without considering
complex user-defined functions, regex pattern matching (for instance)
as built-in into SPARQL can become as CPU intensive as GIS operators.
I recommend phrasing this more moderately, along the lines of
GeoSPARQL operators being some of the most CPU intensive functions
over literals in common use.

Section 3:

On p. 3, left column, the authors make several assertions about
various GIS systems which are relevant and important, but software
changes quickly and software documentation is often voluminous. It
is recommended that the authors refer to the specific system version
investigated and cite publications or to specific sections in
software documentation that support their statements.

Section 5.5:

I would add that the authors could also recommend that a linkset is
created between the two codelists, and maybe suggest a new datavalue
property that is a property of the GeoSPARQL entity SRID and has as
value the Simple Features SRID.

Section 6:

In the evaluation section, it is not clear what to attribute to
Jena/TDB and what to attribute to the authors' caching mechanism.
My understanding of the manuscript is that of (a) a presentation of a
GeoSPARQL query processor developed over Jena/TDB; and (b) a caching
method that speeds up the former. The former would exist and would be
a substantial contribution even without the latter, and the latter
should be evaluated wrt. its added value over the former as well
wrt. prior systems.

We get a taste of what caching contributes to the underlying Jena/TDB
graph database in Fig. 3 by comparing index size 0 against the rest of
the points in the figure. But I recommend that the authors more
clearly separate the two contributions: (a) evaluate the GeoSPARQL
query processor by explaining how well it covers the requirements they
put forth in Section 3 and comparing their requirements coverage
against the coverage achieved by other systems; and (b) evaluate their
caching method by assuming as a baseline their system with cache size 0,
i.e., the basic Jena/TDB GeoSPARQL processor they have developed.
Then the performance of the cache size 0 system can appear in all
tables as the Jena/TDB baseline, so that the reader can get an idea of
the value added to Jena/TDB by the caching method. Naturally, this is
in addition to the extrinsic comparison with other systems.

Section 7:

On p. 18, right column, lines 39-46 the authors present query
optimization as if it were the query author's obligation, and not
something that the query processing engine should take care of.
I would have grasped the opportunity offered by the dramatic execution
time improvement gained by re-arranging the query to present query
optimization as a promising direction for future research.

Minor editorial:

Sect 1.1, l. 51:
"quality of life": "usability" sounds more conmesurate with the
impact to one's life of the ease of writing queries.

Sect 1.2:
"The principle contributions..."
I think you mean the primary or the main contributions.

Sect 1.3, l.26:
"This section some..." -> "This section presents some..."

Review #3
Anonymous submitted on 26/Feb/2019
Suggestion:
Major Revision
Review Comment:

Summary
The authors present a GeoSPARQL implementation extending the Jena framework. This implementation implements all components of GeoSPARQL. Also, the authors describe a mechanism for fast transformation of geometries into different coordinate reference systems and they also build indices on the fly in order to improve query execution time.

Minor comments
p1,l15,l33: a RDF → an RDF
p1,l19-20: it is not clear whether the proposed benchmarking framework is used for the evaluation process or the existing Geographica benchmark. The author probably means that he is using the Geographica benchmark’s datasets and experiments?
p1,l23: too many “geospatial”-based keywords
p1,l34-38: It is unclear what it means. Does it refer to the second contribution of this work, which is a new benchmarking framework? Then the first (“implementation”) and third(“development”) elements of the list, where “This work...features of” applies to, are the same concept. It would also need a “This work also describes …” and a meaningful correction at the end where it states that they perform a comparison of the proposed benchmarking framework with “.. other GeoSPARQL implementations” which one can assume “ … stores” (since a benchmark framework may test for GeoSPARQL conformance but it is definitely not a GeoSPARQL implementation) and then how does someone compare benchmark frameworks with RDF stores?!!
p1,l40-41: smartphones and in-vehiche navigation devices are part of the Internet of Things
p1,l42: “increasing” used twice in the same sentence and does not sound very nice. p1,l42-43: “datasets reliant upon geospatial data”. Applications can rely (expresses need) upon geospatial data. Datasets do not rely upon but can include or at most be linked with geospatial data. I would propose to rephrase it as “applications reliant of geospatial datasets”.
p1l44: the GeoSPARQL → GeoSPARQL

p1c2l39: SQL stands for (Structured Query Language) not (Server Query Language)!!!
p2l13: but is limiting
p3l33: However, Parliament
p3l36: Strabon uses and old version of RDF4J
p6c2p25-26: “been” is used twice in the same sentence. Please rephrase.

One positive contribution is that the benchmark checks for GeoSPARQL conformance.
Second positive contribution is that it records the initialization time for each test system, which is not measured in Geographica.

p9c2l7: JMH with the [21] reference pointing to http://openjdk.java.net/projects/code-tools/jmh/ does not mean “Java Measurement Harness” but “Java Microbenchmark Harness”. There is a fork of original project in https://github.com/ahoffer/java-measurement-harness which is actually titled “Java Measurement Harness”. Which one did they want to use?

p9c2l9: “... by protecting against JVM optimisations …”. This is inaccurate. By reading from the JMH site, one can understand that JMH by default uses 20 warm up cycles (without measurement, providing the opportunity to the JVM to optimize the code before the measurement starts) and 20 real measurement iterations. Therefore what the authors probably need to say is “... by protecting against JVM unoptimised measurements …”

p9c2l27: Incorrectly claims that “ … Geographica places filter functions in the SELECT part … rather than in the WHERE part”. Out of 29 MicroBenchmark queries in 4 query categories of Geographica only 6 (Non-Topological) and 2 (Aggregations) have functions in the SELECT clause.

p9c2l43: The DataLoad mode is run through Java and I doubt that it would allow proper measurement of many RDF store capabilities which are well known for their dedicated bulk-loaders. Ontotext’s GraphDB for example has at least 2 bulk loaders (LoadRDF and PreLoad tool) and OpenLink Virtuoso isql tool which are best controlled through scripts or would require special handling through a Java application. See also p14l31-32 where it is made clear that although there are bulk methods available they are not easily integrated to the proposed benchmark framework.

p10l13-14: Conformance testing is advertised throughout the paper and I agree that it is important, in fact “it is an area of future work”.

In 5.5 correctly identifies the issues with getSRID() and all SRS related data in GeoSPARQL which did not allow other systems in the past and the new proposed system to be GeoSPARQL-compliant!

p12l39-47: Why would someone that identifies data “with potential long-term re-use” store them in a registry which “is persistent and small”?

My major remarks regarding the paper are described as follows:

First, the innovation of the techniques described in this paper is marginal. All the techniques described in the paper are well-known established techniques. For example, the way that the authors transform geometries in a different CRS is a well established technique widely used in geospatial databases and even in geospatial RDF stores that have CRS support. I agree with the authors on the fact that transformation to a different CRS is an important task in the geospatial domain and I fully agree with their decision to include it in their benchmark. However the proposed solution implemented in the system seems decent but not innovative enough to be considered an important contribution
Second, another issue is that it lacks thorough study of the experimental evaluation. I am not convinced by the justification provided in some cases. For example, it is stated that Strabon does not actually perform a spatial equality but a string equality and that this is the reason that it performs better but this could be a problem when the geometries are not expressed in the same CRS. Since Strabon uses PostGIS as back-end and translates all GeoSPARQL functions into the respective PostGIS ones, it seem weird if it deviated from this approach in the case of spatial equals. Please check the SQL query that is produced by Strabon. If the respective spatial equals function is included in the query, then the assumption that you make in this part of the paper is not true. In general, I would like to see a more in-depth study about the performance of the systems in the benchmark. Provide the query plans when necessary and explain when and why the spatial indices are used, etc. in order to explain the performance of a system in each case.
Third, I was confused by the use of the term “index” in the evaluation of GeoSPARQL queries. It is stated that the proposed implementation builds indices on-the-fly. I would like this part to be more clear in the paper. An algorithm showing how the index is built and, especially, used in the query evaluation would be very helpful. What is described in this paper seems more like a caching mechanism than an index. The geometries are available just because the are in-memory, minimising I/O tasks, but apart from that, I did not see the description of a new, dedicated index structure as I would expect. Maybe the description does not make justice to the work done, so please consider clarifying these issues better in a future version of this paper.
Next, I think that significant background knowledge and related work is missing from the paper. There are more GeoSPARQL implementations than the ones mentioned (e.g., GraphDB, Oracle Spatial and Graph also have free versions). These implementations should also be considered for the evaluation. A more thorough description on what is offered by competitive systems (.e.g., in the form of a table), would be very helpful.
Furthermore, the authors draw the conclusion that “Each of the reviewed implementations uses modified relational databases for persistent storage rather than an RDF graph store. These relational databases have been adapted for RDF usage rather than developed specifically for the purpose. Therefore, there is potential for graphstore implementations to have design optimisations which improve performance. The extension of relational databases can also present a setup and environment configuration burden that is justifiable in production scenarios but limiting for prototype development and research.” This is a general assumption that is not supported by the experiments, in my opinion. First of all, there are triple stores with geospatial support like GraphDB, and hybrids like Oracle Spatial and Graph that were not included in the assessment. Secondly, I don’t think that the experiments are designed with the aim of being able to answer to this question (performance of hybrids vs graphstores). For example, in the cases where materialized indices are needed (e.g., intermediate results do not fit in the main memory), hybrid systems with spatial indices have an advantage in heavy geospatial operations (many geometries, large number of points per geometry, etc.). Needless to say the whole lot of work done in the area of RDB2RDF systems that could participate in the hybrid systems vs graph stores debate. But I don’t think that should in the scope of this paper to go into this direction.
The authors should replace Figure 4 with 3 separate figures which will have appropriate scaling on the vertical axis that will make it easier for the reader to realize the performance differences between systems and also allow for the error bars to be visible.

Thus, i believe that in order for this paper to be published, it should include the following major revisions:
Evaluation. I think that more systems should be added in the evaluation section, More specifically, the performance of GraphDB and Oracle spatial and graph should be compared to the performance of the proposed system.
Equal chances for all systems, i.e., to make all effort to provide standard optimisations proposed by each systems (configuration parameters, fine tuning) so that the performance comparison is fair.
Datasets. The benchmark should be extended with bigger datasets that correspond to real workloads nowadays. Currently, the small volume of the datasets is not suitable to draw conclusions about the performance limitations of the systems under test. A scalability experiment should also be added, as the proposed approach for fast query execution seems to have limitations that have not been explored in the current version of the paper.
In order to be able to perform experiments with big datasets, bulk loading mechanisms offered by all systems should be properly assessed and included (even in the cases when they are used as external programs without a java API).
Conformance testing. We agree that conformance testing is a very good idea and we fully encourage the authors to implement it in order to add to the novelty of the work described in this paper. Conformance tests should be implemented and included in the benchmark, and the respective results should be reported.
In-depth study of the experimental results. There should be a comprehensive analysis explaining why a system outperforms competitive system. Query plans should be studied so that conclusions should be made regarding each system’s optimisations, indices, etc.