Review Comment:
The paper describes a manually curated linked open dataset of energy efficiency measures and recommendations based on energy audits from Sweden and the US. The authors aim to make previously only manually producible and disjoint data available as integrated RDF supported by SPARQL, Snorql and web based search interfaces - all in order to support policy research, application development, and future energy audits.Currently however, the data is focused for use within a Swedish context, and is available only in a mixture of English and Swedish languages.
Overall, the purpose and method of creation, and the description of vocabularies used is sufficient to enable exploration and use of the linked data. All data and endpoints were functional and available at the time of review.
The paper could be improved however, by clearer descriptions of the quality, maintenance and use (future or actual) of the data.
(1) The need to harmonize industry classifications lead to the omission of some IAC data - this fact is made explicit in the paper, but it would be beneficial to elaborate further - is there any way to measure, identify, or otherwise characterize what fraction of the IAC data is lost or not represented in the integrated dataset so that users can better interpret and (re)use the linked data?
(2) There is no license specified for the data or the vocabulary in machine readable form. The paper states that the data is licensed under CC-BY 4.0 - it would be good to make this explicit in the data itself, perhaps using CC REL http://creativecommons.org/ns. Presumably, the CC-BY 4.0 license is compatible with all the underlying original datasets.
(3) It is not clear how this paper differs from content alluded to in reference [1] which is not yet published. Does the current publication represent a legacy dataset that has already been superseded and has had additional data and quality issues addressed and added to in the 'forthcoming' publication? If so, one might question the utility of the current data being described and made available. Because much of the discussion on quality and prospective use of the data is deferred to an unpublished future article, it is difficult to evaluate the current situation. For example, the authors should specify which version of the data does the SPARQL, Snorql, and demo search interface use - it is always the most current version? How and where does any additional data become incorporated, and how are any data releases or changes managed and publicized to users other than via a change in the URI? If there are known shortcomings or ongoing improvements to the current dataset (as the authors indicate in section 3.5), the type and nature of these should be made explicit in the current article in order to benefit users and not merely alluded to.
(4) The dataset usage cases discusses how the uniform categorization of the Swedish data was beneficial (although one reference is unavailable/unpublished), however no mention is made of the utility of the US data or the usefulness of external linkages. Are there any cases where US data or links to geographic and SCB information has been beneficial? Please include if so. If not, more concrete examples of how such additional information can be utilized in future to answer questions that are currently difficult would be informative.
In general, the paper could benefit from being rewritten with this additional information so that references to future (as yet unpublished and inaccessible) articles are not required to supply the context or justification for data maintenance and quality and use cases.
|