Review Comment:
This paper describes the creation of a linked dataset in RDF for weather data which adheres to the SOSA/SSN ontology. The authors provide with clear and concise elements on how the input data is received from sensor, the ontologies reused for publishing the dataset, the URIs policy for creating resources and quantitative aspects of the dataset (for example metadata info, VOID page, etc).
The dataset itself is clearly described, but I find two weaknesses in the current paper: (1) no clear evidence of the usefulness of the dataset and (2) stability issue since some of the resources are 404 (at the time of reviewing) and no guarantee on the process of maintaining/releasing new versions with the pipeline described in the paper.
Some issues found while assessing the dataset:
The link to the endpoint http://ontology.irstea.fr/weather2017/sparql was 404. That was frustrating. However, I went to the webpage and found that there was this link http://ontology.irstea.fr/weather2017/snorql/ for exploring the dataset. Could you please clarify which endpoint works for consuming the dataset?
=== General impression ===
+ Quality and stability of the dataset.
The data is of good and trustable quality. However, there is no evidence regarding the stability of the dataset. The authors mentioned that there is a previous version of the dataset. However, the authors do not give any evidence about how the new dataset is related to the previous one. Would that mean the previous one is deprecated? Should the users now use the one dataset? How machines can “infer” such decision if there is no metadata regarding both datasets?
+ Usefulness of the dataset
It is clear that meteorological dataset is useful. However, the evidence provided in section 6 is not clear enough how they are using the dataset. It is not clear also if the researchers mentioned are outside of IRTEA. It would be great to have concrete use cases and/or benefits of consuming RDF dataset, instead of the raw CSV files. Why is it useful for robotic experiments to consume wind speed in RDF instead of using directly the value from the sensor? Please elaborate more in section 6 to clearly identify the usefulness of the dataset.
+ Clarity and completeness of the descriptions.
The authors clearly describe the ontology network reused for the creation of the dataset. It is missing a graphical view of the interactions/links between the ontologies of the network. One question regarding reusing other namespaces by IRSTEA is to know how far they “trust” those resources? Is there any study on the criteria to reuse external resources for long term access to the resources provided by the organization?
=== Technical reviews ===
Please add units (if any) for column in Table 1.
In “A simpler ontology dedicated to sensor and actuator…” why using the term simpler? Do you mean a generic ontology?
Regarding the reuse of external ontologies, have you checked the ontologies/datasets published by the French IGN with respect to geometry (http://data.ign.fr/geometrie, http://data.ign.fr/ignf)?
In Figure 1, Is it possible to infer from the data that irstea:commune/montoldre is located in France? For example, INSEE France described “Montoldre” with this URI which might be useful to have a link from your dataset.
You might also consider having mappings with datasets from INSEE at http://rdf.insee.fr/sparql for specific French divisions.
Maybe having some alignments with geo:lat/long properties could lead to more discovery of the dataset?
Regarding alignments with external datasets, it’s not clear to me if you are describing more the alignments at the schema level or at instance level. In Page 8, “The individuals, (..) are as much as possible linked to others datasets with the owl:sameAsproperty”. This is confusing because individuals are instances of classes, and can’t be linked with properties axioms. Could you clearly describe what are (1) the mappings done at ontology level (classes, properties) and the number of those with external ontologies and (2) the alignments done (which properties used), metrics of instances linked to those external datasets?
Please describe how you will add new versions of the dataset (let’s say when publishing data for 2018, 2019), versioning metadata and changes (if any) in the transformation process of the python script.
=== Some typos/suggestions ===
“as shown in 1” correct to “as shown in [Table 1]”
“and their elements” correct to “as [the] elements”
“It also propose” correct to “it also [proposes]
Reference [5] is not fully described in References, missing URL
“are as mush as possible” correct to “are as [much] as possible”
Page 7, the datahub name is truncated.
|