Brazilian Cerrado: a case study of linked geographical and statistical data applied to Ecology

Tracking #: 642-1852

Adriano Souza
Oscar Corcho
Luis Vilches-Blázquez
Paulo Salles

Responsible editor: 
Guest Editors Semantics for Biodiversity

Submission type: 
Dataset Description
This paper describes an ontology network that will serve as the basis for publishing and linking data about wood plant communities of the Brazilian Cerrado biome, obtained from scientific studies, meteorological and environmental data and geographical information (maps). Data Cube, Meteorological, GeoSPARQL and Time ontologies were used as infrastructure ontologies. In addition, two domain ontologies - Cerrado Concepts and Wood Plant Dynamics Ontology (Ccon) and Fire Ontology (Fire) - were developed to represent scientific knowledge about vegetation ecology focused on vegetation dynamics under different burning regimes. Datasets provided by Brazilian government agencies and those obtained from scientific literature, found in different formats, were transformed into RDF using Open Refine along with its RDF extension or, for shape files, using the geometry2RDF tool. A web-based application was deployed using Map4RDF, so as to provide a proper visualization of aggregated information, integrating map visualization using Google Maps API with ontology-based facet browsing. Ongoing work investigates possibilities of integrating these data with qualitative reasoning models. This research has potential to boost applications of linked geographical and statistical data technologies into ecological research and applications to conservation of biodiversity
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
By Guanyang Zhang submitted on 25/Jun/2014
Major Revision
Review Comment:

This manuscript presents novel ontology developments for integrating data of a large world biodiversity hotspot area, the Brazilian Cerrado. This work may enable a network for publishing and linking data about the Brazilian Cerrado biome, especially about the relationship between fire and wood plant communities. I particularly like that the authors have consulted 'domain experts' to evaluate the correctness and completeness of their ontologies. However, there are also certain aspects that I fail to comprehend or the authors might have omitted. Below is a list of comments or questions.

(1) I could not find any information on how to access the RDF of the linked data sets. The authors collected data sets with rich information on the aspects of the Brazilian Cerrado and claimed that have linked the data sets. These data sets should be deposited online and made publicly available.

(2) The project aims to link statistical and geographical data with ecological or biodiversity data. However, it is not clear to me what kind of product or output has been produced and how those could be accessed by readers. The authors very briefly describe this in section 6.3. Also the authors state that “A web-based application was deployed using Map4RDF, so as to provide a proper visualization of aggregated information, integrating map visualization using Google Maps API with ontology-based facet browsing”. It seems to me the authors did not provide an actual ‘application’ or website for checking the Map4RDF .

(3) This link is broken.

(4) There is 'wet season' in the Ccon ontology, but there is no 'dry season'. Is this a purported or careless omission?

(5) Page 2, left column, second paragraph: “How the data was transformed is explained section 5” - Should be ‘in’ section 5.

(6) Page 4, right column, first paragrpah: “Therefore, these variables where taken into account to build the ontology.” - ‘where’ should be ‘were’.

Review #2
Anonymous submitted on 07/Oct/2014
Review Comment:

The paper "Brazilian Cerrado: a case study of linked geographical and statistical data applied to Ecology" is submitted as a Dataset description. In the introduction the paper motivates well the need for better semantic models to support community-based ecological research.

Section 2 is titled "Overview of our ontological model" and in it the authors describe an "ontology network" that combines existing ontologies with two new ontologies in order to better represent concepts related to ecological communities and the "dynamics of Brazilian Cerrado wood plants" specifically. The two ontologies that are new contributions are the Ccon ontology (Cerrado Concepts and Wood Plant Dynamics Ontology) and Fire ontologies. Figure 1 which describes the proposed ontology network needs improvement. 1) Visually it is rather confusing, which makes it difficult to follow the connections-- a simple spatial re-organization of the figure would help tremendously. 2) The meanings of the directional edges is not described either on the figure or in the figure caption. The authors claim in Section 2 that "current available ontologies do not directly cover issues regarding ecological communities" and this is true by and large but they should reference the Population and Community Ontology (PCO) as there is some overlap in conceptual content. That ontology is coming from microbiology and so is less appropriate for ecologists but there could be general concepts that bridge between that ontology and the Ccon ontology that the authors propose.

Having read through the paper I am still unclear as to how the ontologies listed in the paper are connected, however. The domain and infrastructure ontologies are only linked in Figure 1 but the connection is not made clear in the text. The infrastructure ontologies are merely presented without any description of how they are used. E.g., how are the time ontology and GeoSPARQL (which is not an ontology) linked via the data that are described using the Ccon and Fire ontologies? The two new ontologies that are presented in section 4 have some value, I believe. There appears to have been quite a bit of thought put into their development (for the Cerrado use case at least). Although not referenced in the text, I see in the ontology that the ontology designers built off of the OBOE spatial and temporal ontologies, so the concepts are linked to well-established projects.

With respect to the ontologies, I do have a few comments, though I am not an expert on community ecology. For example, in the Fire Ontology I wonder about the choice of modeling 'environmental condition' as a subclass of 'natural cause'. Environmental conditions be anthropogenic in origin. And is Weather a subclass of environmental condition? I see members of the class Weather such as "High" or "Low" but that does not make sense to me. The choice of modeling here depends in some respect to the questions that might be asked of the data, so it would be good to go into more detail describing the modeling choices and the tasks that the model should support. Stylistically, there is inconsistancy in capitalization and minor typos throughout the ontologies. E.g., 'mortality rate', 'Plant Mortality', 'Plant resprout'.

It is nice to see an evaluation of the ontology, though I do not see how listing the number of pitfalls from OOPS at each stage in development is very useful. I see that at sometimes in the development cycle there were 'critical' pitfalls. What were these critical pitfalls, how were they resolved, and what did they reflect in the overall design choices? The evaluation by domain experts is welcome, and I would like to see the questions used in the questionnaires as an appendix to the article. The number of domain experts that were consulted does not constitute a statistically significant number, however, so the percentages are not that meaningful. And in terms of one case (sufficiency to describe dynamics), the spread was completely random. This is fine as there are probably not enough domain experts that could be consulted to be a large sample and so the open-ended questions are the most valuable part of the evaluation. The most interesting parts of section 5.2.1 and 5.2.2 are the final paragraphs, where there is brief discussion about how the domain experts think the ontologies could be improved. I would like to see more detailed discussion about the changes that the experts would like to see and specifically why the changes would require "to reorganize and redo part [sic] of the ontologies".

The biggest flaw in the paper is that the most substantial component (section 4) regards the development of the Ccon and Fire ontologies, but the ontology network that is proposed does not appear to be fully implemented -- it is only vaguely described. Section 6 which concerns "Linked data generation and publishing" references a link to resources: but that link is not available online, so although the authors claim there is published linked data it does not appear that this is the case. As a result, there is no way to evaluate the quality of the datasets. Likewise, I have been unable to find the web based application described briefly in section 6.4. Why was the paper not submitted as an ontology description in that case? The authors could have delved more deeply into the design choices for the 2 new ontologies.

As it stands, I think the paper has a lot of potential in describing a resource for ecologists wanting to model knowledge about ecological communities, but 1) the connection between this case study (that is the specific ontologies built) and the larger problem alluded to in section 2 toward describing concepts of community ecology needs to be better articulated; and 2) without the dataset actually being published and linked to the described ontologies, I cannot recommend it being included in the special issue without significant reworking. If the authors can provide this with substantial accompanyment in the text, then that would greatly improve the paper.

Review #3
By Mark Schildhauer submitted on 12/Oct/2014
Review Comment:

This manuscript was submitted as 'Data Description' and should be reviewed along the following dimensions: (1) Quality of the dataset. (2) Usefulness (or potential usefulness) of the dataset. (3) Clarity and completeness of the descriptions.

This paper describes potentially interesting work using ontologies to better confederate disparate natural science data, in order to better understand the biodiversity dynamics in the Brazilian Cerrado, a world biodiversity hotspot. While a number of existing ontologies are referenced, as well as data sets to which those concepts might apply, there is insufficient detail on any of these to evaluate whether the authors' approach, as a "Case Study", is effective.

The importance of the focal region is well justified, however, and several established vocabularies are identified as relevant for extension to this particular use case. However, the discussion of the two domain ontologies remains far too abstract, and the URL's provided for obtaining the comprehensive vocabularies do not properly resolve. Rather, these lead to nicely constructed Web renditions of the vocabularies, whereas for the SWJ purposes it is necessary to be able to evaluate these in their full detail.

Partial depictions of these ontologies in both Figs. 2 & 3, and on the Web indicate that extensive thought and effort went into constructing these, so I strongly recommend the authors' provide access to both the Ccon and Fire ontologies so that a closer examination and informed critique can be provided.

Quality of the dataset:
Unfortunately, no links are provided to a data product. Mention is made of formal ontologies created to successfully link data that are then display in web visualizations-- but direct access to these applications and products is not clearly provided.

Clarity and readability:

A highly similar paper appears recently to have been published in the 2014 Proceedings of the 7th Congress of iEMSs. Are the authors still interested in publishing this as a separate work. If so, suggest that they go here into much greater detail about the OOPS! evaluation process, with more discussion and analysis of the iterative process that led to reduction in "pitfalls". Of value would be specific examples showing types of critical errors made, and how these and more minor errors could be rectified.

Much greater detail could provided in section 6 on "Linked data generation", and how the datasets in Table 1 were "transformed and linked" in ways that enable more effective analyses. What was the value of transforming shapefiles to RDF? Examples of the use of "owl:sameAs" should be provided to support the claim that this enabled "enriching reference information (geometry) with data". It is not clear what is meant as stated.

In sec 6.4 allusions are made to "web-based visualizations of the aggregated information using Google Maps and Map4RDF". A URL is needed so reviewers can assess the extent of these accomplishments.

Some of the references seem inappropriate, e.g. in sec 4.1, par. 2, why references [6, 9,12]? Only 6 appears appropriate. Similarly, in sec 4.2, reference to [25, 30] seems erroneous for 30? Perhaps 31 is intended? Also for sentence referenced by [7, 30-31], maybe [7, 31,32] is correct?

There is great potential interest in learning more about the accomplishments of Souza et al. with regards to using semantic approaches to link both geospatial and observational data in the service of biodiversity assessment and conservation policy. I encourage the authors to review my comments and those of other reviewers and consider how to address these and resubmit.