LinkedGeoData: A Core for a Web of Spatial Open Data
Review 1 by Simon Scheider
The revised version of the paper addresses most of the review critique in an appropriate way.
One could still ask for a better motivation and embedding into recent research on spatial information integration. What makes LinkedGeoData special with respect to common approaches, like gazetteers, illustrated by example?
However, since I consider the paper rather a report on tools and systems, this lack does not seem a major burden to publication.
Review 2 by Prateek Jain
My recommendation is based on the following actions taken by the authors for my original comments
Comment-1. The table explaining the conversion between LGD and Geonames dataset in my opinion is extremely useful, considering the shallow ontology which Geonames provides. It will be interesting if the SPARQL Endpoint for LGD can support queries over Geonames using the mappings, which have been constructed. It will be useful to the community both from the perspective of getting access to Geonames via SPARQL and also a way around the Geonames modeling issues.
Action taken: I haven't noticed anything done by the authors with respect to this point. However, this isn't a major issue and it was a suggestion to increase the usability of the work. Hence, this is a minor point.
Comment 2. It will be quite interesting if LGD can create links to other datasets beyond owl:sameAs links. There is a brief discussion about part of relationship creating issues with respect to mapping. Geonames provides a property "parentFeature". Perhaps a technique can be incorporated in the overall architecture which can use the parentFeature link to map part of relationships. While it's a straightforward extension, it will make LOD richer with relations beyond owl:sameAs.
Action taken: The authors have explained in detail their views/actions with respect to linking to other dataset such as MusicBrainz. They also discuss briefly about issues with respect to modeling other relationships.
Comment 3. The evaluation with manual verification of 6526 is fairly comprehensive and in the absence of an existing benchmark, probably the best authors could have achieved.
Action taken: None required
Comment 4. This comment is more about the overall state of datasets present in LOD, rather than just the paper. The authors have given examples of applications, which are using the dataset. However, majority of the applications are academic research lab applications. I am eager to see an application, which is using LOD datasets in applications beyond those constructed in academic labs. Only example I have seen is perhaps use of DBpedia by Watson.
Action taken: I am happy to see a very detailed discussion and explanation of the different real life applications which are using Linked Geo Data. This is exciting overall for the LOD community itself.
Comment 5. It will be a worthwhile discussion about plans to link LGD to other LOD datasets.
Action taken: Addressed as part of one of the comments above.
Review 3 by Dalia Varanka
Accept as is, some minor editorial corrections are suggested.
The reviews below address a previous version of the manuscript.
Review 1 by Simon Scheider
This paper describes a well-recognized contribution to the development of a spatial data web. It gives an overview of solutions that were developed to publish OSM data in the form of RDF, spanning from OSM-RDF mapping, ontology building, methods of data access, interlinking with Geonames and FAO data, live synchronizations, and tools built on LGD.
Although the paper is obviously not intended as a research paper (it may actually be listed as a ``report on tools and systems''), it is nevertheless required that the authors refer to and discuss the relevant state-of-the-art. And this is my main point of critique. The authors take a semantic web perspective on VGI, but fail in many parts to take existing research in GI Science into account. They take a tabula rasa approach to GeoInformation, ignoring work that could be valuable for comparison or reference. This can be seen already from the reference list: With few exceptions (, ), GI Science research does not really appear.
For a report on tools and systems with a demonstrable value, the paper may be acceptable provided the authors address the issues mentioned. So I recommend conditional accept.
These are the more specific points of critique:
1) Introduction: I also believe that LGD could be a valuable core for a spatial data web. But a claim like ``many real-life information integration and aggregation tasks are, however, impossible without comprehensive background knowledge related to spatial features...'' needs references. The tasks mentioned are treated in various research on location base services and GI web services. In the conclusion, the terms ``geo-data syndication'' and ``semantic-spatial searches'' appear the first time without explanation or reference.
2)Interlinking (6): ``Only LinkedGeodata nodes are used for matching [between Geonames and LGD] as they have names as well as positions'': Besides the fact that ways actually have positions that are regions, I wonder whether the authors are aware that there are more possibilities to calculate a similarity between two arbitrary spatial geometries than just the distance between two reference points. From the very beginning of GI research, complex spatial operators like point in polygon or topological relations (9 intersection) have been discussed. They are available in every postgis database. And the decision to leave more complex geometries out of the business actually turns out to be a major problem: The big variance of the factor c on p. 9, the maximum distance that two points describing the same object are reasonably expected to differ, is of course largely influenced by the complex geometry hidden underneath. For example, if we match Germany in both databases, then the DBpedia point may be located far away from a centroid of the respective OSM polygon, e.g. in Berlin, while a point-in-polygon test may nevertheless be able to correctly infer similarity. The same for matching roads. Since I can't see any arguable reason for this decision, it seems rather an ad-hoc approach. This might be acceptable if the authors had referred to any existing work for remedy. There are numerous papers of the last 10 years on matching gazetteer footprints, starting with Linda Hill: "Core Elements of Digital Gazetteers: Placenames, Categories, and Footprints", or Wu, Winter: "Inferring Relevant Gazetteer Instances to a Placename", or Janowicz, K. and Keßler, C. (2008): "The Role of Ontology in Improving Gazetteer Interaction". There is also work on combining spatial and thematic similarity measures that could be cited, e.g. Janowicz, K., Wilkes, M., and Lutz, M. (2008): "Similarity-based Information Retrieval and its Role within Spatial Data Infrastructure".
3) LGD Browser (9.1) and spatial query optimization: I do not understand the sentence ``This is due to the fact that the database can only use either the longitude or latitude index''. A spatial database like postgis or oracle spatial is able to handle any form of spatial index. And what is the authors' reason for choosing a "quadtile" index (a name obviously invented by the OSM community)? I can't see any difference to the well known "quadtree index" (why then use a name not common in science?). Furthermore, how do they know that an R-tree is not better suited? There is also extensive research on spatial indices that may be cited, have a look into H. Samet: "Foundations of Multidimensional and Metric Data Structures".
Some minor suggestions:
- Figure numbers 2,5-9 seem wrong since not quite matching with text.
- There is more than one grammatical error, e.g. "all to points, as every point may at some point be connected to way" on p. 13
Review 2 by Prateek Jain
The work presents a description of LinkedGeoData dataset and the methodology employed for the creation of the same. LinkedGeoData is a geographical dataset constructed by converting data from Open Street Maps (OSM) to RDF. The paper describes the methodology for creation of these datasets and the applications, which have been constructed using the datasets. The dataset is extremely useful for the Semantic Web community and the efforts put in to create the datasets are laudable. Due to the nature of the work and the details presented, the work has been evaluated as an Ontology Paper, as specified in the call for papers provided at http://www.semantic-web-journal.net/reviewers. With respect to each of the criterion for ontology papers, here is my comment
(a) Quality and relevance of the described ontology (convincing evidence must be provided) : The paper describes in detail about the applications which have been built using the ontology, so its definitely provides enough evidence and details. Besides the data set is one of the major and prominent data sets about geographical information available on LOD. The authors seem to have followed a fine technique for the construction of the ontology and having personally used it, I can vouch for the quality of the ontology as well.
(2) Illustration, clarity and readability of the describing paper, which shall convey to the reader the key aspects of the described ontology. : The paper is very well written and describes the key aspects and issues in details w.r.t construction of the data set.
I have few comments with respect to the paper.
1. The table explaining the conversion between LGD and Geonames dataset in my opinion is extremely useful, considering the shallow ontology which Geonames provides. It will be interesting if the SPARQL Endpoint for LGD can support queries over Geonames using the mappings, which have been constructed. It will be useful to the community both from the perspective of getting access to Geonames via SPARQL and also a way around the Geonames modeling issues.
2. It will be quite interesting if LGD can create links to other datasets beyond owl:sameAs links. There is a brief discussion about part of relationship creating issues with respect to mapping. Geonames provides a property "parentFeature". Perhaps a technique can be incorporated in the overall architecture which can use the parentFeature link to map part of relationships. While it's a straightforward extension, it will make LOD richer with relations beyond owl:sameAs.
3. The evaluation with manual verification of 6526 is fairly comprehensive and in the absence of an existing benchmark, probably the best authors could have achieved.
4. This comment is more about the overall state of datasets present in LOD, rather than just the paper. The authors have given examples of applications, which are using the dataset. However, majority of the applications are academic research lab applications. I am eager to see an application, which is using LOD datasets in applications beyond those constructed in academic labs. Only example I have seen is perhaps use of DBpedia by Watson.
5. It will be a worthwhile discussion about plans to link LGD to other LOD datasets.
1. On page 3, "as shown in Figure 2"→ "as shown in Figure 1"
2. Page 14, "omit approximately 20mio triples"→ (Probably) "20 million triples".
Overall the work is a good description of a Geographical dataset, the methodology, applications built using it. The dataset is already a valuable contribution to the community. The related paper will provide further benefits to developers, and researchers related to the community.
Review 3 by Dalia Varanka
The paper reports on extensive and advanced work creating the linkages between Open Street Map and the Semantic Web. A full life-cycle of multiple steps of the project is explained in detail. The paper is richly detailed with adequate supporting documentation for specialist issues such as multi-lingual data, and spatial dimensions. The application solutions are well-respected and highly interesting, and internationally valuable. The paper falls within the scope of the journal as described n the home page.
The only weakness of the paper is that the authors do not articulate a research framework or context for the project. For example, research issues of a broad scope are not identified or discussed and the work focuses narrowly on solutions to the specific application in question. Some evidence for this is that a section devoted to related or similar projects appears at the end. Other work is written descriptively and without analysis, thereby failing to draw the work of these authors into very much context. This special issue of the Semantic Web Journal, however, welcomes papers describing highly applied work.
I recommend accepting the paper with some revision; editing the paper for fewer technical details (the paper reads a bit like a technical users guide) and expanding the discussion of the solution implications relative to the broader state of the Semantic Web and its current research topics.