Enabling the Geospatial Semantic Web with Parliament and GeoSPARQL
Revision accepted by editor.
The reviews below are from earlier versions of the manuscript; the PDF file is a resubmitted version.
Review 1 by Sven Schade
This revised version provides the improvement, which are required to finally accept the contribution. I have no further requests and hope to see this important work for the geospatial community published soon.
Review 2 by Jens Lehmann
The authors addressed the issues I raised in the previous review, except the major remark - which unfortunately cannot easily be fixed. For this reason, I think the article should be either accepted as is or rejected. I do not think that any further revision should be made, because everything which can reasonably be addressed has been done by the authors in a very satisfactory manner. The only small suggestion would be to include "Parliament" in the title to make the article look less like a standardisation document. Otherwise, the article is easy to read, well-structured and provides a good overview.
Review 3 by Boyan Brodaric
This manuscript is ready for publication.
The reviews below are from earlier versions of the manuscript; the PDF file is a resubmitted version.
Review 1 by Boyan Brodaric
This paper is very much improved, and most (but not all) of my concerns have been adequately addressed. The paper is much clearer and focused, and should be accepted pending some minor-moderate revisions:
1. The authors have not addressed the case where their approach results in a feature being specialized from multiple classes. What are the implications, if any, to reasoning, etc., if the required specialization from geo:feature results in a class having multiple parent types?
2. I think it would be clearer if all the usage and implementation parts to be combined. They are now scattered in 3 parts (5.4, 6, 7). At the very least sections 5.4 and 6 should be combined, to separate description from use/implementation. Also, the beginning of the Parliament section should mention that it is an example representation, not required.
3. Overall, the use and implementation sections are short on showing results of queries (except for table 2). They describe how the query would be run, but do not show some results of the execution, e.g. listings 10 and 11.
4. There is still no evaluation. The authors argue that query 11 is more efficient than 10, but show no evidence this is the case. This evidence should be added or the section removed from the paper.
pg 12: "desirably" should be "desirable"
pg 13: missing a period at the end of the first full sentence "...convert them into a GeoSPARQL point[.]"
Review 2 by Sven Schade
This revised version provides a major improvement to the initially submitted document and widely addresses the primary comments of the reviewers. Especially the style of the presentation to the wider Semantic Web community has been improved. Accordingly, the article should be accepted for the journal.
However, only one major issue requires revision:
- Section 2.1 on page 2 explains the difference between features (entities in the real world) and geometries (geometric shapes that are used as representations of features"™ locations). This is taking parts of the feature definition in ISO and OGC, where sometimes a feature is even said to be a representation. However, the use of the two notions (feature and geometry) in section 5.1 does not follow these definitions, for example on page 7, column 1, paragraph 2, where it says "˜In the real world it [a Feature] has a geometry that corresponds"¦"™. According to section 2.1, a feature does not have a geometry but a location which is represented by a geometry. The authors should either revise their definitions in section 2.1 or align the use of these definitions throughout the paper. It might be also worth to mention the specialization of resources into information resources, and the relation of both to representations.
Minor editing is required for resolving the following issues:
- Missing blank spaces before some of the references, e.g. before  on page 1, before [16,17] on page 4 and before  on page 12.
- Page 1, column 2, line 6 should just mention only "˜RDF"™ and not "˜the RDF"™.
- Page 2, paragraph 1 contains important information, but seems miss-placed. The authors may consider providing this part as a footnote to the author names on page 1.
- Page 6, column 1, line 4 has one blank space too much (after footnote 13 and before the dot).
- Page 6, column 2, all three sentences in the enumeration miss dots at the end.
- Page 9, column 1, paragraph 3, last sentence ("˜Note that the bounding box"¦"™) could be replaced with "˜Notably, the bounding box"¦"™
- Page 10, section 6.2, sentence 1, API misses an explanation of the acronym.
- Page 12, section 7.2. line 2-3 contains a sentence that is not understandable (at least not to me): "˜"¦extract more meaning from the relations between the data"™.
- Considering all examples used in the document, I had not the possibility to cross-check the pending OGC documents. However, if examples are re-used from the candidate GeoSPARQL standard, then that document should be properly referenced.
Review 3 by Jens Lehmann
The authors addressed my comments on the previous revision: They clarified their contribution to GeoSPARQL, provided a clearer focus of the paper and fixed all smaller issues. I enjoyed reading the substantially extended revision of the article. However, a main point of criticism remains: When looking at the GeoSPARQL documents, it is evident that many people, in particular from Oracle, contributed. The authors are amongst the contributors, but cannot represent the whole effort (e.g. the draft spec was edited by Matthew Perry and John Herring). Considering this, I still have doubts whether the contribution is sufficiently significant to warrant a semantic web journal publication. I want to emphasize, however, that there are other contributions apart from describing GeoSPARQL such as the integration in Parliament, the provided sample data and scenario as well as a state of the art overview.
- I tried the query in Listing 13 on the provided public Parliament store. There seems to be a typo in it (qep:within => geo:within). Please make sure to test all queries in the paper for such typos. After correcting the typo and using all namespaces as in the paper, I still get 0 results for the query. That could be my fault or could be related to the graph issues discussed in the paper, but please double check the results and make sure the provided example works online. Even if the query would work, it seems quite unintuitive: The buffer, which is used via LET (note that SPARQL 1.1 actually contains BIND), contains every object within a certain distance. That probably won't scale if you are looking for objects further aways than a few meters, e.g. schools which are less than 10000 meters from an airport, since the buffer will be huge. Furthermore, those (very typical) queries should be simpler syntax-wise (e.g. they are simpler in Virtuoso as you can use bif:st_intersects with a 3rd distance parameter, although I am not implying that this is necessarily a better solution).
- "DBpedia contains information from GeoNames and goes so far as to include the latitude and longitude information for many entities." => DBpedia extracts latitude/longitude from Wikipedia, not from GeoNames! (You can check that those values are not always identical.)
- GeoVocab is criticised for using one RDF resource per point. It should be mentioned that one of the ideas behind GeoVocab is to use content negotiation to obtain different formats (GML, WKT, KML), i.e. large polygons can be stored in appropriate formats. Some people may argue that this is a cleaner way to mix spatial information and RDF structure compared to potentially very complex literals. For querying, SPARQL 1.1 makes it a bit easier to use RDF collections and, of course, using resources for points may also be beneficial in some use cases, e.g. for routing or data integration. Related to this, in Section 5.4 you argue that "without the CRS, another property would need to be added onto the Geometry which would increase storage requirements and make sharing data more cumbersome". In my opinion, proper modelling has higher importance than reducing required storage in those cases, where the additional storage requirements are reasonably low. In fact, it is not even clear whether there is a significant penalty given the various compression methods of triple stores (which usually do not apply to literals - you cannot even use namespaces there).
- "DBPedia" should always be written as "DBpedia"
- For LinkedGeoData, there is a very recent article in the Semantic Web journal (http://www.semantic-web-journal.net/content/linkedgeodata-core-web-spati...) which could be used as reference instead / in addition to the older ISWC paper. Similarly, for DBpedia there are also newer overview articles (DBpedia – A Crystallization Point for the Web of Data, Journal of Web Semantics and DBpedia: A Nucleus for a Web of Open Data, ISWC).
- "[...] running on the OpenLink Virtuoso platform with all the beneﬁts and limitations that this platform provides." => A commercial edition of Virtuoso is used, which has some spatial inference capability (if that is what the limitation comment is directed at). We also provide a (PostGIS based) REST interface for LinkedGeoData.
- Abstract: "In this paper we" => "In this paper, we"
- Page 1: "for GeoSPARQL, the current state of the art .." => Ambiguous. It could either mean that you consider GeoSPARQL to be state of the art or that you will describe the state of the art after having described GeoSPARQL.
- Page 2: "locations are are"
- Page 5: "data of different system" => "systems"
- Page 5: "you do not what" => "you do not know what"
- Page 6: "Unfortunately this" => "Unfortunately, this"
- Page 7: "an geo:Feature" => "a geo:Feature" (several times)
The reviews below are from the initial submission; the PDF file is a resubmitted version.
Review 1 by Sven Schade
The contribution introduces a geospatial extension to the RDF query language SPARQL, which is currently under development within the Open Geospatial Consortium (OGC). The authors briefly motivate and introduce GeoSPARQL with examples; describe the challenge of indexing geospatial data including a state of play analysis of geospatial extensions to triple stores; promote their own GeoSPARQL extension to an Open Source triple store called Parliament; and reflect on the potential impacts of GeoSPARQL to Linked Open (Geo)Data.
This work addresses a highly relevant and up to date topic and provides significant contributions to next generation information infrastructures. However, the current version of this contribution presents a general overview of the work without directly addressing the intended audience of the Semantic Web Journal. There are two options to proceed, (1) accepting this paper after 'minor' revision as a report on tools and systems; or (2) re-submitting this paper after major revision as a full paper. I strongly encourage the authors to consider the second option. This work could provide a milestone in implementing a Geospatial Semantic Web.
- Obviously the authors are part of the GeoSPARQL work that is currently ongoing at OGC. However, the exact contribution of the authors should be clarified in the main text and in the acknowledgements (potential IPR issue).
- The GeoSPARQL approach is not more than a projection of the discussions about Spatial SQL and Object-Relational Data Bases, of almost two decades ago, to RDF. Of course this has some value in its own, but at least some central Spatial SQL work should be mentioned in the paper (as part of the history). See for example: http://www.spatial.maine.edu/~max/RJ14.html.
- The title does not reflect the content well, because it focuses on Linked Data while the paper addresses two issues (focusing on the first): querying geospatial data with (Geo)SPARQL and Linked (Geo)Data. I would expect a title such as 'GeoSPARQL: Enabling a Geospatial Semantic Web'.
- The abstract is too short. It should include more information about GeoSPARQL and a sentence about the conclusions and future work. In addition, why is 'context' mentioned here? 'triple store Parliament' should be ' Parliament triple store' (as later in the text).
- The keywords might be extended with "˜geospatial data"™ and "˜query language"™.
- The introduction does not address the (wider) Semantic Web audience, some basics of geospatial sciences should be explained, especially the Web Service and SDI branch, and the role of standards such as those of OGC and ISO (TC211). The same holds to later sections, where at least geometries (point, line, multi-line, polygon etc.) and the topological relations (incl. 9 intersection model versus RCC8) should be introduced in more detail. Table 1 might be used much earlier. In addition, coordinate reference systems and spatial indexing (R-Tree etc) have to be explained.
- The introduction misses a clear statement about the overall purpose of this work.
- The introduction mentions that an analysis of the current state of the geospatial Semantic Web is included in this paper. Later this is kind of included in the section about indexing. Either this statement should be relaxed, or the state of play analysis should be separated from indexing issues, i.e. a state of the art section should precede the motivation for having a RDF spatial-querying standard and a session on particular indexing issues may follow later.
- The introduction might finish with a paragraph explaining the structure of the paper.
- Most figures should be converted into listings.
- Figure 8 is hard to read and should be re-formatted.
- The heading of section 5 is too general, it should be re-phrased to something like "™Extending Parliament with GeoSPARQL"™
- All abbreviations should be properly introduced.
- There are a few formatting errors, e.g. text running across margins and missing empty space before references that have to be eliminated. Also some code fragments are not highlighted in the text and "˜etc"™ should not be followed by "˜"¦"™.
- The actual status of the GeoSPARQL specification should be clearly mentioned in the paper (not only in the acknowledgments).
- Section 7.2.3, line 4, "™posed"™, shouldn"™t this be "™answered"™?
Additional comments, if considering a full paper:
- GeoSPARQL is certainly OGC work, but it would be interesting to know how this relates to the ISO General Feature Model and the ISO Spatial Schema.
- ogc:asWKT and ogc:asGML are practical solutions for referring to different representations, but the use of MINE types might be discussed, too.
- Applying topological relation between Features is ambiguous, even if a 'default geometry' is specified. Especially in a Semantic Web or Linked Data environment, the default might vary from context to context. I would appreciate a deeper discussion on this. IMO this should not been implemented.
- The decision of offering the 9 intersection model and RCC8 might be discussed. Both are isomorphic anyway. Do they both hold for all types of geometries?
- The differences between GeoSPARQL and current solutions might be explained in more detail, e.g. ogc:Geometry versus virtrdf:Geometry.
- The need for an index becomes clear, but where should this be stored if we think about distributed information sources, e.g. in Linked (Open) Data.
- Figure 7 indicates one possibility to define a geometry at querying time. Possible options might be presented in more detail.
- Section 5.1 mentioned the requirement for temporal queries. This discussion should be elaborated more. Would it make sense to use the ISO Temporal Schema? Which more general (mainstream) models could be applied as alternatives?
- When it comes to topologic relations, issues of tolerances and implementing several topologic models based on the same geometry might be discussed. Different users might be interested in different models. Years ago, Radius Studio offered a similar solution for Oracle Spatial. This might be an item for future work.
- Section 6 and 7 address Linked Data as the second topic of the paper, which is basically an add-on to the RDF and spatial querying topic. This separation should become more explicit. In fact, both of these sections provide good arguments, why it would be ideal to have a standard way of querying geospatial data and that this should be harmonized with common RDF querying. It might be considered to re-factor these sections into a motivation for the general approach. This line of argumentation might help to connect to the audience. Introducing the use case earlier would give the overall work a clear purpose (as requested above), as well as it would illustrate functional requirements for GeoSPARQL.
- The first mentioning of Linked Data misses a reference.
- On page 8, the authors briefly introduce place names and their importance in Linked Data. Here it should be mentioned that place names might be sufficient for communicating geospatial information and that location is not always required. Trying to always associate a geometry to a place might not be a desired approach.
- The use (and danger) of owl:sameAs might be discussed, too.
- Section 7.1 implies the need of an EPSG ontology/vocabulary or of an OGC dictionary for coordinate reference systems. At least the latter has kind of already failed. This point could be discussed in more detail.
- Section 7.2 discusses many conversions. The feasibility of such an attempt should be discussed in comparison to system of system approaches with mediation (see for example GEOSS). Transformations could be done, but should this really be implemented in the suggested way?
- Section 7.2.3 mentions the need for loading required data sets into a single knowledge base, in order to perform reasoning. While this is a common approach, the issues of the volume of geospatial data and potential Linked Data sets should be discussed (or at least mentioned as an issue). How may users find the appropriate data sets and load them into a knowledge base in order to perform the reasoning that is required for answering an application specific question?
- Over the last few years, many works addressed the semantic annotation/enrichment/augmentation/enablement of OGC services. I would expect a few more references of such related works.
Additional comments, if considering a report on tools and systems:
- A running instance of the extended Parliament should be provided and referred to in the paper.
- Section 5.2 talks about optimization, a few figures should be added to explain the optimized characteristics. Large amounts of data should be considered.
- The Linked Data related sections (6 and 7) may be omitted in such a paper.
Review 2 by Jens Lehmann
The article describes the usage and current status of GeoSPARQL, a spatial extension to the SPARQL query language for geographic information currently developed by the Open Geospatial Consortium (OGC).
The authors motivate GeoSPARQL and show several example queries demonstrating its features. They continue with a state of the art overview of spatial indexing support in triple stores. I agree that there is a need for such a standard and, in fact, there have been several initiatives over the past years to establish it. What remains unclear for me when reading the paper is how much the authors are/were actually involved in the creation of GeoSPARQL. As the description of GeoSPARQL takes up a significant part of the article, it would have been very important to point out the specific contributions of the authors (I believe Dave Kollas was/is co-chair of the corresponding OGC group, but this should be made explicit). Furthermore, the relation to http://geovocab.org is not sufficiently explained in the article.
In Section 5, the article presents the Parliament triple store. The section is well-written, but if the authors want to establish Parliament as a highly scalable spatial triple store, it is inevitable that they should compare its performance with other stores or geographical information systems such as PostGIS. In my opinion, it would be better to have a stronger focus in the article. Either it should focus on GeoSPARQL (with the main contributors to GeoSPARQL as authors) or dedicate an article to the Parliament triple store including benchmark results and detailed feature comparisons.
In Section 6, the authors say that spatial queries cannot be run over DBpedia, because such queries are not supported by the triple stores. However, queries for WGS84 points (lat/long) are supported in triple stores like OpenLink Virtuoso commercial edition as the authors themselves mention in Section 4. This part of the section should be rephrased. Furthermore, Section 6 is quite incomplete. More complete spatial datasest lists can be found at http://geovocab.org/doc/survey.html and http://www.semantic-web-journal.net/content/linkedgeodata-core-web-spati... (Section 10.1). The statements in Section 7.1 are obvious and should not be in a separate section.
Overall, the article is well-written and I enjoyed reading it, but in my opinion the contributions do not warrant a semantic web journal publication.
page 3: "topoligical"
page 7: "is is analysed"
page 7: "it's bounding"
page 7: "it's boundary"
page 8: DBPedia => DBpedia
page 8: Please cite our Journal of Web Semantics DBpedia article, instead of only pointing to the website (see http://wiki.dbpedia.org/Publications).
page 8: "all of relations"
overall: It seems that information about GeoSPARQL is spread in several places (PDF files, some presentations) and it is difficult to adopt it without a proper openly available specification (in HTML). Until that changes, it is difficult for triple store vendors and spatial data publishers to adopt it. The authors should explain more clearly why they are (apparently) certain that it will be widely used soon.
Review 3 by Boyan Brodaric
Review of "Linking Geospatial Data With GeoSPARQL"
This paper describes the GeoSPARQL specification, an emerging standard for querying geospatial data in linked data environments from the Open Geospatial Consortium. GeoSPARQL is designed to enable queries involving geospatial location, as 'hotels at an airport' or 'museums in a city', using the SPARQL query language. It responds to two main obstacles faced by linked environments: (1) the lack of geospatial position for many entities that are known to be located in geographical space (e.g. a hotel or airport), and (2) the lack of a standard geospatial ontology that can be used to query geospatial entities (e.g. hotels at airports). To overcome these issues GeoSPARQL provides a small ontology for geographical features, geometric objects (e.g. polygons), and relations and functions (e.g. within). The enable geospatial locations to be represented and connected to features, and allow query heterogeneity to be overcome through mapping to a standard geospatial ontology and standard query functions. A side effect could be the uptake of the ontology in the storage or transmission of geospatial data via rdf. The initiative is significant, timely, and a very suitable topic for the journal.
The paper describes the motivation behind GeoSPARQL, GeoSPARQL itself, some triple store vendor implementations for spatial querying, and a GeoSPARL example before concluding. It is has good technical content, but should address the following issues, involving a major revision, before it is ready for publication:
1. The paper needs some re-structuring to consolidate sections for problem, related work, solution, implementation, evaluation, and conclusions. These bits are somewhat scattered throughout the paper.
Problem: move sections 6-7.1 into section 2 (Motivation) as motivating scenarios.
Related work: should include section 4, re-scoped, plus review other geospatial SPARQL query approaches (see below).
Solution: include section 3, without 3.1, and elaborate the description of GeoSPARQL (see below).
Implementation: includes query examples from 3.1 (possibly), 7.2, and implementation details from section 5. Could include an evaluation section discussing query performance and possible accuracy (gaps, false positives).
2. The unique contribution needs to be clarified: what are the advantages of GeoSPARQL over other efforts? Is it technically superior, or is it more significant due having the weight of a standards body behind it, or both? This point does not come across in the paper, but should.
3. The review of related work is incomplete: the paper surveys spatial querying amongst triple store vendors, but neglects to review several efforts in the research community. A quick search turned up several, which are included in the list below. What is the relation of GeoSPARQL to these, and which, if any, areinfluential in its design?
4. The review of vendor solutions loses focus: is it on efficiency (i.e. indexing) or on query predicates (the ontology)? As it stands the efficiency angle dominates, when the paper is seems to be about the query predicates—the priority needs to be switched.
5. The implementation section (Parliament) requires an evaluation: possibly in terms of performance (how much data, how fast) and accuracy (gaps, false positives). Scalability is mentioned as an issue, related to indexing, but without measures—some metrics would be useful here.
6. Lack of clarity and precision in descriptions: statements are often general or vague, and important terms are not explained adequately. For example, entity, object, feature, geometry, property are not clearly explained. These critical terms could have overlapping meanings, and should be explained early in the paper (does entity = object? is property a quality or a relation?) Also, many statements are not supported with examples and references. Here are some examples:
E.g. the following statement is very broad and needs at least some brief examples to support it, as well as references:"Often geospatial domains have complicated type hierarchies which cannot be fully expressed in current geospatial information systems. Also, geospatial domain problems often require marrying multiple data sources together to solve a particular problem."
E.g. this statement is not accurate, as the state of the Semantic Web is not analyzed in a general sense (e.g. missing similarity, data and service interoperability, sensors, geoprocessing, ontology design, etc.): "In this paper, we analyze the overall state of geospatial SemanticWeb data."
E.g. what does "properly" refer to in "compliant RDF triple stores should be able to properly process the majority of spatial RDF data."
7. The GeoSPARQL description (section 3) needs elaboration. The paper sketches 3 components (ontology, functions, and transform rules), with some examples, but the reader is left wondering about the full scope of each. A diagram showing the ontology, a table of the list of functions and transform rules, would help, combined with concise discussion of each.
8. Need to show how GeoSPARQL can integrate with ontologies having existing class hierarchies. Do entities in the other ontologies, such as hotel or airport, need to specialize/instantiate ogc:Feature in order to be related to a geometry (position)? This will lead to tangled class hierarchies, which is not always desirable and has reasoning drawbacks. Could an entity be related to a geometry directly, without specializing/instantiating ogc:Feature? This needs clarification.
9. The language needs some polishing, as several sentences are awkward or even incomplete. E.g. "These spatially extended databases have given the combination of efficient, stable storage and retrieval of data with geospatial calculation and indexing." A thorough proofread, perhaps by a third party, would help clean this up.
Additional Possible References:
Brodt, A., Nicklas, D., Mitschang, B. 2010. Deep integration of spatial query processing into native RDF triple stores. GIS: Proceedings of the ACM International Symposium on Advances in Geographic Information Systems 2010, Pages 33-42
Manolis Koubarakis and Kostis Kyzirakos. 2010. Modeling and Querying Metadata in the Semantic Sensor Web: The Model stRDF and the Query Language stSPARQL. Proceedings of the Extended Semantic Web Conference 2010, vol.6088 of Lecture Notes in Computer Science, pages 425–439. Springer, 2010.
Jain, P., Yeh, P.Z., Verma, K., Henson, C.A., Sheth, A.P. SPARQL query re-writing using partonomy based transformation rules. Lecture Notes in Computer Science, Volume 5892 LNCS, 2009, Pages 140-158.
Xiao, Z., Huang, L., Zhai, X. Spatial information semantic query based on SPARQL. Proceedings of SPIE - The International Society for Optical Engineering. Volume 7492, 2009, Article number 74921P
Hu, H., Du, X. 2010. Linking open spatiotemporal data in the data clouds. Lecture Notes in Computer Science, Volume 6401 LNAI, 2010, Pages 304-309
Zhai, X. , Huang, L. , Xiao, Z. 2010. Geo-spatial query based on extended SPARQL. 2010 18th International Conference on Geoinformatics, Geoinformatics, 2010, Article number 5567605. .