Review Comment:
The paper describes a linked data set developed in the RÉPENER project based on three existing input data sets. The data set is well described, even though the language can be improved in places (see some suggestions below) and should be checked by a native English speaker. Although the authors present a number of possible uses and the services developed on top of the data set, the actual use remains unclear. It would be good to include actual usage figures or metrics if available.
I also feel that the potential of using a linked data approach has not been fully tapped. Much of the described work could also have been achieved using traditional (relational) database technology and data integration (ETL) methods. In particular the re-use of existing vocabularies and the links to other external data sets could be improved.
Detailed comments:
- Section 1: "This requires having access to energy information at the different stages of the building life-cycle –from design, to construction, and operation– and not in separated sources." -- I agree that the information on the different stages is necessary. But why is it a problem to have that information in different sources? Please explain.
- Section 2: Why did you only include 202 of the 1800+ energy certifications of ICAEN? Even if the 202 chosen entries contain the most detailed information, also the other entries may be useful in some cases. Please explain.
- Section 3 / Figure 1: The distinction between properties and concepts is not clear in the figure. The legend states that black arrows represent object properties, but the arrows are not named. It is unclear if the boxes represent concepts/classes or also properties. Please explicitly make this clearer in the figure and textual description.
- Section 3.1 (data transformation): "Finally, the values of the use of building (repener:mainBuildingUtilisation) (...) have been converted to the classification provided by the DATAMINE project [5], an international domain reference. In this way, third-parties, from other countries, are able to understand the data." -- I assume this means that the DATAMINE classification contains multi-lingual labels, correct? Is this the only multi-lingual code list/classification used in the data set? Are rdfs:label's provided in different languages? If not what language is used for textual properties?
- Section 3.1 / Figure 2: Have you considered the benefits/drawbacks of your solution to load all data sets into the same triple store (rather than establishing a separate triple store for each data set)? What are the implications if the data sets evolve? (e.g. does the ETL process have to be run again every time there is a change in one of the source data sets? how will the central triple store know about any changes in the source data sets?) On a related note, what does the notion of "data set" refer to in the paper? After the ETL, all the content of the triple store could be considered a data set (the RÉPENER Linked Dataset), i.e. all links between resources coming from different source data sets are now internal (to the newly created merged data set). This could be discussed in the paper.
- Section 3.2 (data linking): "For instance, a climate zone resource such as C2 (see http://...) connects both sources through repener:hasCity and repener:hasBuilding properties." - This is not very clear. Why are climate zones used with hasCity and hasBuilding properties. A figure may help explaining how the link works.
- Section 4: It seems to me that all the described services could also be implemented based on a conventional database - they do not illustrate any additional benefit of using a linked data approach. Please illustrate how the links (in particular to external data sets) are beneficial for the presented services.
- Section 4.1: "It can be explored also graphically, in a heat map implemented on top of Google Maps." --> what does the heat map show - just the density of where there are buildings in the dataset? Or are the heat maps also related to the energy efficiency of the represented buildings?
- Section 4.2: How is the temporal aspect (which is needed for the "before-after-renovation" comparison) handled in the ontology?
- Section 5: "While Reegle and OpenEI platforms offer energy-related data at a country level policies, regulations, energy production or renewable resource RÉPENER's dataset collects data for specific buildings" -- This suggests that additional external links to the data provided by projects/platforms like Reegle or OpenEI (e.g. on policies, regulations, energy production or renewable resources) could be added to the data set.
Some suggestions to improve the language:
Title (and text)
"RÉPENER’s Linked Dataset" sounds strange to me (probably because it suggests RÉPENER to be a person rather than a project). Maybe consider using "the RÉPENER Linked Dataset" in the title and text instead.
Abstract:
- "The following of the Linked Data principles" --> "Following Linked Data principles"
- "The dataset is a Knowledge base for end-users" --> "The dataset is a knowledge base for end-users"
Section 1
- "the improvement of the energy-efficient of new and existing buildings" --> "the improvement of the energy-efficiency of new and existing buildings" or "improving the energy-efficiency of new and existing buildings"
- "Designing and building more efficient buildings become necessary to have a better knowledge of the relationship between design and performance and between the design objectives and the actual performance of the building." -- this sentence does not make sense. Do you mean "In order to [be able to] design and build more [energy-]efficient buildings, it is necessary to have a better knowledge of the relationship between design and performance and between the design objectives and the actual performance of the building."
- "can be found in Madrazo [1]" --> "can be found in [1]"
Section 2
- "simulations results" --> "simulation results"
- "Besides, the ICAEN owns more than 1800 energy certifications, 202 have been included in the dataset be- cause of its simulation details." --> "The ICAEN owns more than 1800 energy certifications, of which 202 have been included in the dataset because of their simulation details"
- "It was thought to use GeoLinked dataset (.es), in the first place" --> "We initially considered using the GeoLinkedData.es dataset" (also change "GeoLinked dataset" to "GeoLinkedData.es dataset" later in the text, e.g. in section 3.2)
- "which stores the populated places of the Spanish territory including geographical data for each record such as population, areas, elevation, or Universal Transverse Mercator (UTM) coordinate." --> "which stores geographical data on the populated places of the Spanish territory including their population, area, elevation and geometry (specified in Universal Transverse Mercator (UTM) coordinates)."
Section 3
- "is provided by Nemirovskij [4]" --> "is provided in [4]"
- "including the links to external datasets. Data transformation" --> "Data transformation" should probably be a (2nd level?) heading
- "through an ETL (Extract, Transform and Load), a process which" --> "through an ETL (Extract, Transform and Load) process, which"
- "Paradox is an obsolete database" -- what do you mean by "obsolete"?
- "In addition, the data extracted from Paradox files have been aggregated from hourly to monthly values since its usage is foreseen in a kind analysis which does not require low level of data aggregation." --> what does "its usage" refer to (the Paradox database)? What do you mean by "a kind analysis"? What do you mean by "low level of data aggregation" (highly disaggregated / highly detailed)?
Section 4
- "to contribute with the improvement of the buildings' energy efficiency" --> "to contribute to the improvement of the buildings' energy efficiency"
- "users inform about the" / "users tell about" --> "users specify the"
- "It can be explored" --> "The results can be explored"
|
Comments
Submission in response to
Submission in response to http://www.semantic-web-journal.net/blog/semantic-web-journal-call-2nd-s...