Design and Development of Linked Data from The National Map

Paper Title: 
Design and Development of Linked Data from The National Map
E. Lynn Usery and Dalia Varanka
The development of linked data on the World-Wide Web provides the opportunity for the U.S. Geological Survey (USGS) to supply its extensive volumes of geospatial data, information, and knowledge in a machine interpretable form and reach users and applications that heretofore have been unavailable. To pilot a process to take advantage of this opportunity, the USGS has selected data from The National Map for nine research test areas and provided these data in the Semantic Web format of Resource Description Framework (RDF) triples to support machine processing and linked data access. The provision of geospatial data on the linked data of the Web is problematic from several perspectives and the USGS is developing solutions to these problems. Specifically, the handling of coordinates for geospatial data in vector format and the identification of geospatial entities and objects in geospatial raster data and the handling of raster geometry (pixels) in a linked data format have proved difficult. It is the purpose of this paper to discuss the USGS approach to developing linked data for both vector and raster data from The National Map databases.
Full PDF Version: 
Submission type: 
Application Report
Responsible editor: 
Krzysztof Janowicz

New revision checked and accepted by editor.


The reviews below are from a previous version of the manuscript.

Review 1 by Carsten Kessler

The paper has improved significantly since the initial submission. There are just a few minor issues left to fix:

- Abstract: in my opinion, there is no such thing as a single "Semantic Web format"
- p.3, right column: there is a ";" missing after [53,17]
- same column, bottom: remove ")" at "[36])"
- Add "." at the end of column.
- Section 3: "themes domains" -> either "themes' domains" or "theme's domains", depending on what you want to say
- "Features terms" -> eithe
The reviews below are from a previous version of the manuscript.
r "Feature terms" or "Features' terms"
- what is the "beginning of a logical axiom list"? Do you mean an initial version?
- p.4, right column: "semantic meaning" -> either "semantics" or "meaning"
- end of that paragraph: add "."
- p.7, bottom of left column: "coordinate stores" -> "coordinates stored" (?)
- first sentence of section 5 "computer server" -> just server should be fine for the SWJ audience
- check consistency of references; some have full first names, others don't

Review 2 by Rainer Simon

The paper reports on a pilot effort to expose sample vector and raster geometry data from the US National Map as Linked Data. Compared to the previous version, the paper has been improved regarding structure and related work.

While it can still be debated whether the paper has scientific value (as it lacks a scientific question, hypothesis, evaluation) it is now positioned more explicitely as a case study. This, in my opinion, makes it a suitable read for practicioners in the field, and justifies publication.

The reviews below are from a previous version of the manuscript.

Review 3 by Claus Stadler

Here I summarize the improvements and corresponding open issues in regard to my previous review:

- The content has been extended, clarified, and the long lists of query results are gone

- Table 1 is now nicely formatted, and was extended with links to viewers/downloads of the source (non-RDF) data.
However, there is still no overview of how many triples were extracted from each of these datasets, and there is also no excerpt of the available classes and the corresponding number of instances. So a table summarizing (some of) the statistics of the conversion effort should be provided.

- There is now a "previous research" section, which acknowledges the efforts of Ordnance Survey and the GeoVocamps, and which gives pointers to approaches for semantic interoperability (such as by combining geospatial ontologies with foundational ones).

Some points in this section should still be clarified:
- "The impact of users' actions, called 'intentionality,'": This does not seem right, as [9] states: "Following Searle (1983), intentionality here refers to the purposes, intentions, motivations, needs, beliefs, and so on, of an observer or a user of the system."
To me, the impact of an action is its consequence - and not e.g. its motivation.
- "... a framework [...] that extends the range ... ": Rather than "extending the range", it seems better to write that they combine/integrate/arrange topographical categories with foundational/upper-level ontologies.
- Clarify that DOLCE is an foundational ontology, and OntoClean a method for analyzing ontologies. (The sentence is somewhat scrambled)
- "Crucial aspects of data integration require the ontology of content data characteristics" -> Should be clarified. I am also not sure how this sentence relates to the content of the citation, which seems to be mainly about so-called "derivation ontologies", which are flowcharts/recipes for deriving a database from set of source databases.

- The discussion on raster data has been extended, and an additional example involving "Meteor Crater" is presented.
Unfortunately, I found the description confusing: What is the role of the ODP (and what does it look like)? And what the final solution to raster data? Is the relation of feature to the pixel represented only in GML? Or also RDF? What it the tag/predicate? I think a small data example would help to clarify. And finally, Meteor Crater is given as an example, but would it be possible to extend the presentation to a use-case (so some question that can be answered)?

- There are no applications mentioned that make use of the RDF data, and no outlook of such applications is given.
On the other hand, it is stated, that the focus of further research is on avoiding the materialization of the relations of the 9-intersection model.
So maybe an outlook of applications/questions that would become possible by doing this could be given here.

In conclusion, the revised version adds significant improvements (related work, clarifications, extended discussions) over the previous version.
The submission now positions itself as a case-study (as suggested by another reviewer), however some points still need more elaboration.
I think it is suitable for acceptance if the open issues are addressed.

- Points that should be clarified and minor issues:

- "Some categories require more resolution between the conceptual and database models than others, depending on the data designs of the themes domains." What are they?
- "The digital files form a vocabulary"... What digital files are being referenced here?
- "Other layers, such as transportation, are poorly matched to the conceptual ontology because they were not developed under feature-based system guidelines" - But rather what guidelines?
- "Complex features require spatial relations ... " - Rephrase the remainder of the sentence.
- "In these cases, the base vocabulary allows relating simple classes into complexes for ontology design patterns.": Can we have an example of such modeling?

- "Thus, the coordinates must be associated with the RDF triple" -> RDF resource
- First its said that the data is converted to GML, then its said its converted to GML and RDF. Should be rephrased.
- "A requirement ..." - > Better: A requirement is the capability to ask sparql queries whose results are can be graphically displayed on map.
- The whole paragraph starting with "To capture spatial relations that support semantic identity..." -> How does this relate to the presented work?

- "The objects are many and depend on the predicate." -> Concrete numbers?

- "...other geomorphic features are more difficult to identify and have indeterminate boundaries [5]" -> Here it would have been nice to have a small example (depiction) additionally to the citation, as this contributes to the understanding of the issues involved with raster data.
- "However, [25] have proposed methods to extend the Geographic Structured Query language (GSQL)" -> It seems better to clarify from the beginning: In the relational realm there exists GSQL, but an equivalent does not exist for the Semantic Web yet.
-"Unlike other approaches that extract the semantic objects from the raster data, our approach is to determine relevant objects and maintain the raster matrix as the geometric basis of the geographic features of interest." -> What does this mean (What does the data look like and/or how can I query it)?
- The example query contains an IP, a proper domain name would be better.

p9: computer server -> server

Review 4 by anonymous reviewer

I think the clarification of the focus as a case study helped a lot.


The reviews below are from a previous version of the manuscript.

Review 1 by Carsten Kessler

The paper describes the development of a Linked Data version for the US Geological Survey's National Map based on nine local sample datasets. It covers the corresponding ontologies, the approaches taken for conversion of geodata in vector and raster formats to RDF and, and it discusses querying the generated Linked Data.

While the topic is relevant for SWJ and the paper is generally interesting and mostly well written, it needs some improvements before it can be published. Most importantly, the authors should add a section that briefly reviews relevant related work. Readers without any background in Linked Data for geographic information may get the impression that this is the first approach in this direction, which is clearly not the case. Especially GeoSPARQL should be explained earlier in the paper and in more detail, as it is used throughout the paper and clearly influenced the design decisions made for the conversion process. Moreover, efforts such as LinkedGeoData [1], GeoLinkedData [2], and the outcomes of the various GeoVoCamps [3] should be mentioned.

The last paragraph of section 3 is largely unclear, especially "the hypothesis is that linguistic terms reflect geometric data operations". The way the authors phrase it is wrong, as linguistic terms do not reflect data operations. I guess what the authors want to state is that (some) linguistic expressions can be used to map to data operations. Also, the reader wonders what happens to this hypothesis? It is not tested or evaluated in the remainder of the paper, so I would recommend to rephrase the abstract.

Section 4.3 remains a bit ambiguous. While the authors give an overview of related work here, examples would be extremely helpful to understand the overall approach. As far as I understand, the authors want to make single pixels in raster datasets dereferenceable (i.e., assign URIs for them). Is this really useful? It seems to me that this part of the conversion is still in an early stage and lacks some maturity. Are the objects described in this section extracted from the images or are they present somewhere else? Moreover, what does this raster-based representation buy us? If the object recognition is finished, it could also be provided based on the approach for point and vector data (sec. 4.1/4.2).

Finally, there are no attempts being made to evaluate the approach. What alternatives would have been feasible, with which effects?

Some smaller issues:
- Some explanations are not necessary for the SWJ audience, such as explanations of N3 or RDF in general.
- Section 3, "semantic meaning" -> "meaning" or "semantics"
- Table 1 seems to be messed up; e.g., "Energy" or "Water Data" is clearly not a data type like "Vector"
- A more compact version of Fig. 4 would be good, maybe even a link to the query on the SPARQL endpoint would be sufficient.


Review 2 by Rainer Simon

The paper reports on a pilot effort to expose sample vector and raster geometry data from the US National Map as Linked Data. Scope and topic of the paper are suitable for the call of the journal issue. However, my main point of criticism is this: I think it's problematic to consider this paper a research publication. There is no hypothesis the authors aim to verify; neither is there an evaluation of the approach. Discussion of related work and positioning within the field is rather minimal. The conclusions (i.e. mainly that raster data is problematic) are rather weak.

In it's current state, I'd actually consider this a 'case study' rather than a research paper. It provides a narrative report on an application of Linked Data in a particular setting. This, of course, can also be of value to the reader! In my opinion, the authors should either:

(a) clearly label this paper as a case study, rather than trying to squeeze it into the frame of a research paper (treating the application of Linked Data principles as "the problem" and then deriving rather weak conclusions). They should then, however, spend more time discussing the details of their work: e.g. the authors report that they designed their ontology by combining top-down (existing standards) and bottom-up (experts' knowledge of existing data sets) approaches. How was this done exactly? Who was involved (experts, stakeholders inside/outside institution)? In Section 4, the authors merely state that they "require that the features be identified in the raster source". How was this achieved? By automatic means? Manually?

(b) or, alternatively, the authors should put clearer focus on indivdual aspects of their work which have more of a research character. E.g. in section 3, the authors mention (ongoing?) research on spatial prepositions in which the geospatial codes resemble natural language semantics. This sounds potentially very interesting. Unfortunately, the authors don't elaborate at all or give background information. Furthermore, the paper would benefit from a more comparative attitude: e.g. how does the authors' ontology compare with existing vocabularies in the field?

Review 3 by Claus Stadler

This paper describes the approach the authors have taken for converting and publishing datasets from the United States Geological Survey (USGS) as RDF.
The datasets were chosen from six subbasin and three suburban areas, comprising eight layers (such as structures, hydrography, and elevation). Depending on the layer, the data is represented as point, vector, or raster data. The conversion to RDF is discussed for each of these types.

I have evaluated the submission in regard to the category "Descriptions of ontologies".
This submission is relevant as it is about the approach of a major map provider to ontology engineering.

The strong point of the submission is the discussion of point, vector, and raster data.
Unfortunately, reading the paper did not leave me with an overall picture of the ontology: In general, many descriptions are very abstract, and overall statistics are missing.

Here are my major points of critique:
- There are about 5 full pages of queries and query results, spanning the 6 pages 7, 8, 9, 10, 12, 13. Especially the list of predicates for an individual resource, starting on page 8, seem superfluous. I rather would like to see a general overview of how many triples were extracted from each dataset from table 1, together with an excerpt of the most important predicates and/or classes.

- The submission does not make "pointers to existing applications, or use-case experiments", as desired by the SWJ review standards. This is understandable to some degree, as there already exist applications which use the non-RDF source data formats. However, in my opinion, there should then be at least an outlook on future work, or even better: the description of the adaption of one or more applications making use of the ontology. On second thought, the example of "Tributaries of Hunter Creek" actually already is a use case, but is not explicitly described as such, since it is only mentioned as an example for data access.

- There is no dedicated related work section, and no other ontologies are mentioned, much less compared to. For an overview of other ontologies the authors might want to have a look at

- The discussion of the raster data in section 4.3 is confusing. For example the sentence "Query and access to raster data on the Semantic Web poses unique problems since ontological objects are not defined in the structure of the data which is a grid of pixel values or digital numbers."

- What is being meant by "ontological object"? To my understanding, "image" and "pixel" can be seen as "ontological objects". Also, from what is written in the section, I do not understand the relation between GSQL with ontology semantics.

In conclusion, while I think the work itself is highly relevant to the geospatial Semantic Web, I have the impression that the submission does not yet meet the SWJ's standards. I therefore vote for "reject and resubmit".

Minor things:
- "Thus, it is the feature name or other attribute or relation that connects the feature identifier with the actual instance or other object of the feature in the data." For my taste, this should be rephrased.
- "A requirement for the conversion is to maintain the ability to generate a graphic for any query result." I think you do not mean query result but: Resources corresponding to features that can be graphically displayed, should also be graphically displayable.
- The reference (Varanka 2010) is missing.
- The query texts are screen-shots, which makes it cumbersome to try them out.
- The id's of the query result on page 13 could be used as labels on the corresponding map.
- The query in Figure 6 does not work with setting 'geosparqltest' as the default graph.
- "Figure 7. The set of URIs that result from the query in Figure 7."; Should be "query in Figure 6."
- Nothing is said about the spatial query capabilities of the SPARQL endpoint - is it possible to search for rivers within an area?
- Although not strictly required by SWJ formatting guidelines, the authors should consider using the standard template.

Review 4 by anonymous reviewer

I recognize that the publishing the National Map following the Semantic Web guidelines is an initiative with great merit. Nevertheless the paper does not go beyond a description of what was done. The manuscript lacks a scientific question either regarding the ontology or the data conversion, which I saw as being two of the possibilities.