Observational/Hydrographic data of the South Atlantic Ocean published as LOD

Tracking #: 2505-3719

Marcos Zárate
German Braun
Mirtha Lewis
Pablo R. Fillottrani

Responsible editor: 
Armin Haller

Submission type: 
Dataset Description
This article describes the publication of occurrences of Southern Elephant Seals Mirounga leonina (Linnaeus, 1758) as Linked Open Data in two environments (marine and coastal). The data constitutes hydrographic measurements of instrumented animals and observation data collected during census between 1990 and 2017. The data scheme is based on the previously developed ontology BiGe-Onto and the new version of the Semantic Sensor Network ontology (SSN). We introduce the network of ontologies used to organize the data and the transformation process to publish the dataset. In the use case, we develop an application to access and analyze the dataset. The linked open dataset and the related visualization tool turned data into a resource that can be located by the international community and thus increase the commitment to its sustainability. The data, coming from Peninsula Valdés (UNESCO World Heritage), is available for interdisciplinary studies of management and conservation of marine and coastal protected areas which demand reliable and updated data.
Full PDF Version: 

Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 20/Jul/2020
Review Comment:

This paper of 10 pages presents a dataset description about observationq of elephant seals based on two ontologies SSN/SOSA and BIGE-onto ontologies.
It seems that the authors have no understood the design pattern behind SSN related to Feature Of Interest, Sample and Property.

Temperature is a Property. It is not a Feature of Interest. It seems that there is two Features of Interest related to temperature: temperature of the water (at the bottom of the dive) and temperature of the surface (it is temperature of the air ?)

Depth is also a property, not a Feature of interest. The final property used is average depth. A Feature of Interest, or maybe a Sample, should be associated to the observation. Sample could be the specific dive of an identified animal. The identified animal is also the platform of the sensor. The specific dive of a identified animal is a Sample of the generic dive, which is the Feature of Interest.

Moreover the dive location should be described using another observation on the same sample: a specific dive of an identified animal. The result of this observation is the coordinate. The Property is location (or coordinate) depending of which vocabulary the auteurs want to reuse.
Note that in Figure 2 the location is no linked to the observation. The authors should explain which type of link is used between the geometry and the observation.

I found strange that the sensor is only identified by its type (base:sensor/TDRMK3). That means that several sensors of same type has the same IRI. Thus a unique sensor is deployed on several platforms (specific animal). Note that there is 9 platforms and only 3 sensors in the dataset. Thus there is a problem.

I do not understand what means the SES_AAEU code. Is it a unique identifier of one specific animal or a code to identify the type of the animal?

Due to the fact that the design pattern about Feature of Interest and Property was not understood by the authors, they decide to use another ontology to describe observation about census. But census observation may be modelised by SSN/SOSA. SSN/SOSA can also describe human observation. See the sensor definition “Sensor - Device, agent (including humans), or software (simulation) involved in, or implementing, a Procedure. Sensors respond to a Stimulus, e.g., a change in the environment, or Input data composed from the Results of prior Observations, and generate a Result. Sensors can be hosted by Platforms.”
In the figure 4, the SSN observation should have as Sample the groupe of Elephant Seals observed. Its Property is the number of female at the stage JUVE. The sensor is the human observator etc…. All the description related to the organism, taxon could be linked to the Sample.
As before the coordinate (location) of the group will be store in the new observation.
The auteur should argue why they use two ontologies to represent observation when SSN/SOSA is not dedicated to device measurement but any observation (sensor, human and simulation).

No explanation is given on the link used between coordinates (location) and geoname.

To conclude this dataset could be a good example of SOSA/SSN usage about observation of specimens or organisms. Unfortunately it seems that the design pattern of Feature Of Interest, Sample and Property is not well understood. I would recommand the authors to first validate their model by a publication in a workshop related to SSN/SOSA and then rebuild their dataset.

Review #2
By Simon Cox submitted on 03/Aug/2020
Major Revision
Review Comment:

The paper describes a dataset published as linked-data structured using a set of existing ontologies. Overall the treatment is appropriate and compelling. However, I found that the mapping of the data to the SSN/SOSA ontologies appears to have a significant flaw.

At the bottom of p3 col 1 the authors propose “sosa:FeatureOfInterest to specify the observed phenomena. In our case, temperature and depth.” This does not match the definition in the SOSA specification: “The thing whose property is being estimated or calculated in the course of an Observation” - see https://www.w3.org/TR/vocab-ssn/#SOSAFeatureOfInterest. Note that this use of the term 'feature' corresponds with common usage in geogrpahic information standards.

For this application, the (ultimate) feature-of-interest is “the ocean”, and more specifically for each individual observation the (proximate) feature-of-interest is a sample of the water-column at a particular location. Temperature and depth, and also location, are properties of this feature-of-interest, whose complete description or state can be captured as the result of a set of observations.

The authors have chosen an indirect solution, arguably inconsistent with the SOSA/SSN model. Instead of recognising that there is a distinct feature-of-interest (i.e. a sample of the water column) for each observation, they use a common FoI for all depth, temperature, and location observations, designated
http://linkeddata.cenpat-conicet.gob.ar/resource/featureOfInterest/location respectively. This also conflates the feature-of-interest and observed-property, and undermines the separation of concerns in the SSN model.

So, while it should be possible to merge the results of observations of different properties for the same FoI to obtain a complete description of that FoI, since it this application the same FoI is used for all observations for each observable-property, this is not possible. I spent some time exploring the linked-data end-point and could not figure out how to join together a set of results in the graph in order to describe the temperature of the water column at a particular time and place. It appears that results must be joined on information that is only presented within the rdfs:label for each observation – e.g. “dive number 17582 of platform AALE on temperature” and “dive number 17582 of platform AALE on surface temperature” and “dive number 17582 of platform AALE on max depth” etc. This is very indirect and inconsistent with the linked-data semantic model that the authors appears to have tried to construct.

While there is some flexibility in the application of the SSN model to specific use-cases, this indirect approach appears to undermine the underlying utility of the data graph. This is a shame as the authors have otherwise done quite a nice job of using the technology to good effect.

An alternative approach, which I believe better matches the semantics of SOSA, would be as follows:
1. Define an individual feature-of-interest with a new URI **for each sampling location**
2. Encode all observations – min-depth, max-depth, avg-depth, bottom-temp, surface-temp, location – with respect to this FoI

Note that in exploring the service, I also could not track down the results of any location observations.

Some other minor issues:
1. In Table 4 two class names are incorrectly not capitalized sosa:platform and sosa:sensor (should be sosa:Platform and sosa:Sensor)
2. The W3C Recommendation should be used for Reference [3], not the editor’s draft. The editors’ names should be given in the citation.
3. The editors’ names should be given in the citation for Reference [15] (OWL-Time).
4. Use of English is slightly incorrect in a few places – e.g. in the second paragraph in section 2, “The instrument is deploy when …” should be “The instrument is deployed when …”, and “The position is also registered when the seal ascents to the surface” should be “The position is also registered when the seal ascends to the surface”; in the beginning of the next paragraph “The censuses and the deploy of the instruments are carried out by the research team belongs to …” should be “The censuses and the deployments of the instruments are carried out by the research team belonging to…”. That’s just in ¼ of one page. This kind of minor error could be picked up by a grammar checker.

Review #3
By Nicholas Car submitted on 12/Aug/2020
Minor Revision
Review Comment:

This paper described is a Linked Data/Semantic Web in use. It models a collection of real world observations as SOSA Observations and uses a series of other ontologies to bring in classes and properties for describing the sensor platform (elephant seals) and observed features.

The custom ontology and the dataset are well presented online in Linked Data form.

I do question some of the modelling in the custom ontology BiGe-Onto, for example, the class Region could just be a GeoSPARQL Geometry, but this is out of scope for this paper which is just about the dataset.

I find the paper uses the ontologies it lists sensibly except for elements of SOSA/GeoSPARQL. The paper declares Geometry objects, e.g. http://linkeddata.cenpat-conicet.gob.ar/page/geometry/point_-63.57_-42.773, with a time and date but the time and date are of the Sensor visiting the geometry, not a property of the geometry itself. Better woul be be to associate the time with the Observation - where it was made. Geometries are also allocated to the sensor in an unordered list that can be temporally ordered by looking at the geometry time property. Better would be to order the geometries by using a rdf:Seq or other RDF collection (see SOSA's extension ontology for an OrderedCollection).

The paper also creates confusion in how it uses geometry in Figure 2 where the geometry example isn't linked to the rest of the example and should probably be in Figure 1.

This Platform/Observation geometry is really the only technical (modelling and data) problem I have with the paper. There are some small ontology errors listed below but these are minor.

The papers references need work. Many don't include all reference parts, e.g. URIs, and some are to outdated resources, e.g. old versions of standards. I suspect some of the issues are related to LatTex format issues. I have noted bad references that need fixing.

Below are a series of small points that should be easy to address. They need to be resolved but only the modelling issues above and references are holding this paper back from instant publication.

* Can the online location of the BiGe Ontology online be quoted when the first reference to it is made on Page 1 (reference [4])? It is given later in the paper, Table 3, but it's not obvious to the reader that it is online until Table 3 is reached.[4]

* Could a web front end be provided for the dataset's SPARQL endpoint: http://linkeddata.cenpat-conicet.gob.ar/sparql? This will make the data much easier to query. There are many easily installable ones to choose from, e.g. https://triply.cc/docs/yasgui/

* In the dataset, the Dublin Core class FileFormat is used incorrectly as a predicate, e.g.:

dcterms:FileFormat "PDF" .
dcterms:FileFormat "PDF" .

The correct RDF uses dcterms:format, which is a property, not dcterms:FileFormat, which is a class:

dcterms:format "PDF" .
dcterms:format "PDF" .

* BiGe-Onto Ontology: this ontology uses mixed forms of URI, e.g. bigeonto:belongsTo, bigeonto:has_location. Can this be standardised in a new ontology version?

* the reference given for QUDT, [13], is not to QUDT itself but to an extension and it's URI is broken. Please just refer to QUDT proper. Refer to http://qudt.org.

* no persistent URI is provided for the GeoSPARQL reference, [14]. Should be http://www.opengis.net/doc/IS/geosparql/1.0

* reference to "SPARQL query language" is out dated - URI for 1.0 is given, should be for 1.1: https://www.w3.org/TR/sparql11-query/

* reference to OWL TIME, [15], does not quote persistent URI, should be https://www.w3.org/TR/owl-time/

* there is a space in the URI http://www.w3id.org/cenpat-gilia/bigeonto/ in Table 3 that needs to be removed for it to work (be able to be clicked on)

* Figure 1 has two spelling mistakes: sosa:host -> sosa:hosts, "average depht" -> "average depth"

* acronym TDR is not explained when first mentioned, page 5, column 1, line 30. Assume Teperature depth recorder?

* Figure 2 contains a geo:Geometry instance not linked to any other instances. It should be moved to Figure 1 where it may be linked to the Platform that the text describes it links to

* Incorrect SOS class use - restatement of Platform/Geometry observation above
In the data I find this (turtle pseudo code):

PREFIX geom:
PREFIX platform:
PREFIX sensor:

a sosa:Sensor ;
rdfs:label "viaje_config #57" ;
sosa:isHostedBy platform:SES_AMLJ ;
sosa:resultTime "2005-11-30"^^xsd:date ;
geo:hasGeometry geom:point_-63.659_-42.784 ,
geom:point_-35.293_-43.747 ,
geom:point_-63.874_-42.835 ;

sensor:AMLJ is declared of type sos:Sensor, which is fine for the predicate sosa:isHostedBy but not for sosa:resultTime. sosa:resultTime's documentation states its domain (schema:comainIncludes) as domainIncludes sosa:Actuation, sosa:Observation, sosa:Sampling. While SOSA uses schema:comainIncludes not rdfs:domain and thus technically anything many be used for the domain of sosa:resultTime, the obvious intention is for a temporal thing, an activity, to use it. It makes no sense for the sensor sensor:AMLJ to have a sosa:resultTime. I understand what is being modelled here - all the observations have relative time starting at the sensor's sosa:resultTime, but different modelling must be used. Perhaps look into the use of an ObservationCollection (https://www.w3.org/TR/vocab-ssn-ext/#sosa:ObservationCollection) with a sosa:phenomenonTime to contain the current sosa:resultTime value.

If the multiple geometries given for the sensor indicate the location of observations, then they should be attached to each observation, not the sensor. Observations in the dataset do already indicate sosa:resultTime but not location.

* Since geometry is associated with the Platform (the seal), but in an unordered array of points (we don't know which is the first, last next etc point) not a POLYLINE, how can individual observations be linked to their location?

* The reference for D2RQ, [18], is to an un-linked conference poster. A link must be provided, e.g. http://wifo5-03.informatik.uni-mannheim.de/bizer/pub/Bizer-Cyganiak-D2R-.... Better would be a reference - perhaps a footnote - to the tool's online documentation (https://www.csee.umbc.edu/courses/graduate/691/spring14/01/examples/d2rq... or http://d2rq.org/d2r-server)

* the URI is used in the dataset, e.g.


But it should be - with an s, "hosts"

Page 1

Column 1 Line 37 query -> queries
1 39 accessible for machines - > accessed by machines
2 33 remove 'the'
2 34 rephrase "To meet Linked Data requirements, datasets must be described with rich metadata such as controlled vocabularies in a particular form - RDF - and published as a findable resource with a unique identifier.

2 38 reference to SSN (ref [3]) should be (from specref.org):

Armin Haller; Krzysztof Janowicz; Simon Cox; Danh Le Phuoc; Kerry Taylor; Maxime Lefrançois. Semantic Sensor Network Ontology. 19 October 2017. W3C Recommendation. URL: https://www.w3.org/TR/vocab-ssn/ ED: https://w3c.github.io/sdw/ssn/

2. 39 reference [4] has some funny numbers at the end that need reformatting

2 41 specie -> species

2 42 "collected along two decades" -> "collected over two decades"

2 43 SES is only defined later, needs to be defined here

2 46 You can't study the demography of non-humans. demography -> ecology

2 50 "and contribute" -> "and to contribute"

2 51 "species behind the changes" -> "species from changes"

Page 2
1 3 Reference [6] has an online accessed date but no URL

1 10 "During their terrestrial phase they are also characterized by high fidelity to the site where they have previously been" -> "During their terrestrial phase they frequently revisit previous years' sites"

ending of language checks

2 41 censuses -> census

Page 8

1 39 the link for the example FoI, SDN:P01::DEPTHC01, is broken. Is http://vocab.nerc.ac.uk/collection/P01/current/DEPTHC01/, should be http://vocab.nerc.ac.uk/collection/OG1/current/DEPTH/

2 21 class is foaf:Person, not foaf:Person

* the sentences between Page 8 and 9 seem to be broken. they read:

Page 8:
One crucial aspect is how to access and analyze data, and especially how to get only that part of data which is of interest for a given research question.

Page 9:
solves the access part, and SPARQL allows to query only a subset of the data.

I suspect a sentence is covered by Table 6.