Weather Data Publication on the LOD using SOSA/SSN Ontology

Tracking #: 2125-3338

Authors: 
Catherine Roussey
Stephan Bernard

Responsible editor: 
Guest Editors Sensors Observations 2018

Submission type: 
Dataset Description
Abstract: 
This paper presents a RDF dataset on meteorological measurements. The measurements come from one weather station of the Irstea experimental farm located at Montoldre. Some measurements produced during the year 2017 are transformed and published as Linked Open Data. The data schema is based on the new version of the Semantic Sensor Network ontology (SSN). This ontology version integrates the Sensor, Observation, Sample, and Actuator pattern (SOSA). We first present the ontology network used to organize the data. Then, the transformation process to publish the dataset is detailed. To conclude we present some querying use cases related to Irstea research projects.
Full PDF Version: 
Tags: 
Reviewed

Decision/Status: 
Major Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Sebastian Neumaier submitted on 12/Apr/2019
Suggestion:
Minor Revision
Review Comment:

The present paper describes a meteorological dataset, and the ontologies used for publishing it. The dataset consist of measurements from a weather station, e.g., barometer, temperature, humidity sensors; the weather station is located at an experimental farm for testing agricultural equipment and machines. The paper further describes the URIs, external references, and the population process.

Strong points:
- The dataset description is clear and complete; the process of publishing data at the SPARQL endpoint is automated.
- There is a SPARQL endpoint available, a Snorql UI, and a Pubby frontend for the dataset.
- The dataset can be found at datahub.io
- There are external references to e.g. GeoNames, DBpedia, ..

Weak points:
- The reviewing guidelines [1] state that a dataset description paper should detail the "usefulness of the dataset, which should be shown by corresponding third-party uses": Unfortunately, this is missing in the paper.
- There is no RDF dump available for download (although I understand that it would be not up-to-date.. you could export a dump on a regular basis, or snapshots e.g. on datahub.io)
- There seems to be no regular updates of the measurements yet.

Summary:
The dataset description is clear, complete, and easy to read. There are query endpoints available, and example queries in the paper. To accept this dataset description paper I would ask the authors to provide the missing "usefulness" evidence.

Spelling and formatting:
p. 2: formatting: “sosa:madeBySensor” exceed the line
p. 4: formatting: “time:hasBeginning” exceed the line
p. 4: formatting: “sosa:FeatureOfInterest” exceed the line, etc... Several figures and lines too wide for printing and even not visible anymore (SPARQL query on p. 8)!!
p.4: “the ‘/’ character is replaced by *the* ‘_’ character”
p. 7: The flow of data processing in this piece of software consists to -> reformulate
p. 8: DBpedia:a -> DBpedia: a
p. 8: linked to GeoName individuals -> GeoNames
p. 8: they need to access precised weather data -> precise
p. 9: from january 2017 to december 2017 -> January, December
p. 9: for french user that do not know semantic Web -> users that do not know Semantic Web
p. 9: the weather archive uptodate process -> reformulate

[1] http://www.semantic-web-journal.net/reviewers

Review #2
Anonymous submitted on 28/Apr/2019
Suggestion:
Major Revision
Review Comment:

This paper describes the creation of a linked dataset in RDF for weather data which adheres to the SOSA/SSN ontology. The authors provide with clear and concise elements on how the input data is received from sensor, the ontologies reused for publishing the dataset, the URIs policy for creating resources and quantitative aspects of the dataset (for example metadata info, VOID page, etc).

The dataset itself is clearly described, but I find two weaknesses in the current paper: (1) no clear evidence of the usefulness of the dataset and (2) stability issue since some of the resources are 404 (at the time of reviewing) and no guarantee on the process of maintaining/releasing new versions with the pipeline described in the paper.

Some issues found while assessing the dataset:
The link to the endpoint http://ontology.irstea.fr/weather2017/sparql was 404. That was frustrating. However, I went to the webpage and found that there was this link http://ontology.irstea.fr/weather2017/snorql/ for exploring the dataset. Could you please clarify which endpoint works for consuming the dataset?

=== General impression ===
+ Quality and stability of the dataset.
The data is of good and trustable quality. However, there is no evidence regarding the stability of the dataset. The authors mentioned that there is a previous version of the dataset. However, the authors do not give any evidence about how the new dataset is related to the previous one. Would that mean the previous one is deprecated? Should the users now use the one dataset? How machines can “infer” such decision if there is no metadata regarding both datasets?

+ Usefulness of the dataset
It is clear that meteorological dataset is useful. However, the evidence provided in section 6 is not clear enough how they are using the dataset. It is not clear also if the researchers mentioned are outside of IRTEA. It would be great to have concrete use cases and/or benefits of consuming RDF dataset, instead of the raw CSV files. Why is it useful for robotic experiments to consume wind speed in RDF instead of using directly the value from the sensor? Please elaborate more in section 6 to clearly identify the usefulness of the dataset.

+ Clarity and completeness of the descriptions.
The authors clearly describe the ontology network reused for the creation of the dataset. It is missing a graphical view of the interactions/links between the ontologies of the network. One question regarding reusing other namespaces by IRSTEA is to know how far they “trust” those resources? Is there any study on the criteria to reuse external resources for long term access to the resources provided by the organization?

=== Technical reviews ===
Please add units (if any) for column in Table 1.
In “A simpler ontology dedicated to sensor and actuator…” why using the term simpler? Do you mean a generic ontology?
Regarding the reuse of external ontologies, have you checked the ontologies/datasets published by the French IGN with respect to geometry (http://data.ign.fr/geometrie, http://data.ign.fr/ignf)?
In Figure 1, Is it possible to infer from the data that irstea:commune/montoldre is located in France? For example, INSEE France described “Montoldre” with this URI which might be useful to have a link from your dataset.
You might also consider having mappings with datasets from INSEE at http://rdf.insee.fr/sparql for specific French divisions.
Maybe having some alignments with geo:lat/long properties could lead to more discovery of the dataset?
Regarding alignments with external datasets, it’s not clear to me if you are describing more the alignments at the schema level or at instance level. In Page 8, “The individuals, (..) are as much as possible linked to others datasets with the owl:sameAsproperty”. This is confusing because individuals are instances of classes, and can’t be linked with properties axioms. Could you clearly describe what are (1) the mappings done at ontology level (classes, properties) and the number of those with external ontologies and (2) the alignments done (which properties used), metrics of instances linked to those external datasets?
Please describe how you will add new versions of the dataset (let’s say when publishing data for 2018, 2019), versioning metadata and changes (if any) in the transformation process of the python script.

=== Some typos/suggestions ===
“as shown in 1” correct to “as shown in [Table 1]”
“and their elements” correct to “as [the] elements”
“It also propose” correct to “it also [proposes]
Reference [5] is not fully described in References, missing URL
“are as mush as possible” correct to “are as [much] as possible”
Page 7, the datahub name is truncated.

Review #3
Anonymous submitted on 11/May/2019
Suggestion:
Major Revision
Review Comment:

The paper describes a weather data set that is originally generated from a particular station of Irstea experimental farm in Montoldre. It will be interesting to see what kind of applications are written on top of this dataset. However, the authors make it clear that the dataset is not well updated, and I hope that this will be addressed soon. Besides, I have a few comments regarding the completeness of the paper and the interlinking process.

There are several missed related works that should be mentioned: http://knoesis.org/ssn2014/paper_5.pdf in ssn2014,http://ceur-ws.org/Vol-904/paper10.pdf in ssn2012 and http://www.semantic-web-journal.net/sites/default/files/swj281_0.pdf in this journal 2011, etc.

The choice of modelling - in particular, the reuse of SSN/SOSA, GeoSPARQL, OWL-Time ontologies - is well grounded. However, it is not clear how the AWS ontology is used to model the data. Note that, the AWS ontology only complies with the old SSN.

In Table 6, the authors should give more details about the dataset, i.e, number of record for each observed properties, the repository size of the dataset, etc.

In terms of the connections to other datasets, It will be more clear if there is a summary table about the number of entities that can be linked to SWEET, Geonames and DBpedia. In addition to that, a short description of how to create these links should be provided.

Regarding the dataset querying use cases in Section 6, I would expect more potential use cases presented rather than a simple query. I feel the reader would benefit greatly if this section was expanded.

The writing of the paper must be reviewed. There are many readability problems that need improvement, far too many for recording here.