Semantic Prediction Assistant Approach applied to Energy Efficiency in Tertiary Buildings

Tracking #: 1609-2821

Iker Esnaola-Gonzalez
Jesús Bermúdez
Izaskun Fernandez
Aitor Arnaiz

Responsible editor: 
Guest Editors ST Built Environment 2017

Submission type: 
Full Paper
Fulfilling occupants' comfort whilst reducing energy consumption is still an unsolved problem in most of tertiary buildings. However, the expansion of the Internet of Things (IoT) and Knowledge Discovery in Databases (KDD) techniques lead to research this matter. In this paper the EEPSA (Energy Efficiency Prediction Semantic Assistant) process is presented, which takes leverage of the Semantic Web Technologies (SWT) to enhance the KDD process for achieving energy efficiency in tertiary buildings while maintaining comfort levels. This process guides the data analyst through the different KDD phases in a semi-automatic manner and supports prescriptive HVAC system activation strategies. That is, temperature of a space is predicted simulating the activation of HVAC systems at different time and intensities, so that the facility manager can choose the strategy that best fits both the user's comfort needs and energy efficiency. The proposed solution is abstract enough to reuse it in similar use-cases of the same domain and it has been proved that improves the accuracy of predictions.
Full PDF Version: 

Minor Revision

Solicited Reviews:
Click to Expand/Collapse
Review #1
By Hehua Zhang submitted on 17/May/2017
Minor Revision
Review Comment:

Energy efficiency prediction of buildings is an important research topic. Data mining techniques are widely researched and applied to fulfill this goal. However, it is not easy to understand and select required variables appropriately. Domain specific knowledge are often necessary to obtain ideal prediction results.

This paper presents a semantic prediction assistant approach, which takes good use of a domain ontology "EEPSA", to help the variable selection, decision making in the KDD process for energy efficiency prediction. The idea is novel and interesting on constructing the "EEPSA" ontogoly and applying them to improve prediction accuracy for the specific problem. The experiments compared the performance between the method with and without EEPSA. The results show that with the priori knowledge of EEPSA, the method can get better accuracy. On the other hand, the manuscript was well organized and written.

However, since the EEPSA ontology takes the predominant role in the method, the authors should explain more about the EEPSA ontology. It is not clear how the EEPSA is designed and evaluated for its usability? It is also unclear how to ensure or measure the correctness and completeness of EEPSA? Furthermore, the SPARQL rules are writen manually, which makes the assistant hard to use for non-experts. Although it is declared as a semi-automatic method, I think the usability of the method should be considered. I suggest a higher-level description of the rule, and generated the SPARQL queries automatically.

Review #2
By Pieter Pauwels submitted on 12/Jun/2017
Minor Revision
Review Comment:

This article focuses on improving both energy efficiency and user comfort in tertiary buildings. It aims to do this with a semantic prediction assistant, which combines the latest and greatest of a number of domains, including semantic web technologies (SWT), data mining, Internet of Things (IoT), Building Information Modelling (BIM), Knowledge Discovery in Databases (KDD). The approach is clearly documented, and tested in one building in Spain.

Overall impression:
(1) originality
In general, this is an innovative paper, as it deploys a whole array of technologies in its use case, while staying true to the nature of these technologies, AND to the use case. Real life tests are also made, making the approach also very tangible and realistic. It is clear from the text that the real-life test helped to improve the proposed approach.

(2) significance of the results
The resulting Semantic Prediction Assistant approach can likely be used quite widely in the facility management sector. Furthermore, several areas are mentioned in the paper where the approach can further be improved, including specific directions and initial proposal, thus leaving significant room for further research.

(3) quality of writing
The paper is well structured in general. There are some repetitive paragraphs / sentences (last paragraphs of section 3.2, second half of section 4), so I encourage the authors to make the paper more concise and stronger in these parts. There are a number of language 'errors', see bottom.

Detailed remarks:

1. Introduction
2. Related work[NO -S]
- This overview of related work is great and much appreciated!
- About ifcOWL (which is very well described, thank you), you indicate: "it is of secondary importance that an instance RDF file can be modelled from scratch using the ifcOWL ontology and an ontology editor." This is correct. Later on in the paper (it is probably too complex for that, like IFC in general), this is used as an argument to not use ifcOWL, and instead use classes and properties in a new EEPSA ontology. Note, however, that it IS possible to create RDF graphs from scratch using ifcOWL, especially if it is only building-storeys-spaces-elements, and less geometry and properties. So, you argument does not hold, in my opinion. In your case, it is probably entirely feasible to use (a part of) ifcOWL.
- It might be worthwhile to indicate that IFC is used in construction industry, and, as a result, it is not that "space-oriented". It rather focuses on building objects (walls, doors, floors, ...), and their relations and geometries. Simply because of that reason, it is usually not the most comfortable ontology to start from.
- You refer to simplifications and extensions to ifcOWL, mainly pointing towards ifcWOD. It is correct that such simplifications (less extensions: IFC is quite extensive already) are being looked for. ifcWOD is such an example, but the ifcWOD ontology does not exist as far as I know (as you indicate). An alternative approach worthwhile mentioning is the simpleBIM approach documented in Also in this case, no ontology exists, unfortunately. An ontology IS available for the REALLY simple Building Topology Ontology (BOT -, which is probably quite well usable in your use case. The aim for BOT is to enable use cases such as yours, while keeping a link to full ifcOWL representations of buildings (properties and geometry). It consists only of 5 or so classes, and some basic proporties that allow to model the core building topology (building-storeys-spaces-elements). As such, it probably answers part of your needs.

3. EEPSA in KDD Support
- The transition from Section 2, ending with all the ontologies, into Section 3, is not obvious. Try to include some form of textual transition. Furthermore, none of the ontologies mentioned in Section 2 is really repeated after Section 2. Instead, all data is "linked to the EEPSA ontology" (second paragraph of Section 2). This comes as a surprise, as this EEPSA ontology has not been introduced by then. Some kind of impression is given of the EEPSA ontology in Section 3.1 (as a combination of a number of other ontologies), but also that is not so explicitly done. In my opinion, there is room to improve the paper by tackling all this in the beginning of Section 3.
- I am not that sure of the term "linking phase". Would it not be better to call this "semantic annotation phase"? Or "structuring data phase"? It is not really about linking, it is about making data available in RDF graphs following selected ontologies (many ontologies besides EEPSA). Right?
- Thanks for including links to ontology sources in footnotes. Much appreciated.
- Thanks for listing all SPARQL queries in the appendix. Much appreciated. I think you miss a namespace and prefix for "bif:" in Listing 3.
- 3.1: The DogOnt was not used because it aims at residential buildings, whereas you aim at tertiary buildings. This is very odd. Can you indicate what is so special about residential buildings that is not available in tertiary buildings (and the reverse) for your particular use case? How does this affect the EEPSA ontology, for example?
- 3.1, third paragraph: When mentioning "it has been decided to offer a simpler way to model a space", you probably arrived right into the domain of the BOT ontology (see above). This BOT ontology does not list huge numbers of building element types (in fact, it lists none), but it can easily be extended in this direction. In my opinion, this is worth considering. This approach will likely also form a way into ifcOWL that is better than "to translate all the building modelling concepts defined in the EEPSA ontology into the ifcOWL representation". BOT is intended to link quite directly into ifcOWL, as well as SAREF and other ontologies.
- 3.3: This is definitely not my main area of expertise, but I am somewhat confused why "pre-processing" is mainly about the topic of "outlier detection". Furthermore, as you indicate, outlier detection has never been really looked into by semantic web researchers. It is also not really explained how outlier detection is implemented here. If you ask me, a large part of this section deals with classifying outliers, rather than with finding outliers. The only thing I see is "The class eepsa:Outlier is defined as a subclass of ssn:Observation in the EEPSA ontology in order to represent observations that do not conform to the expected behaviour". Then what? There is no actual outlier detection involved here. Can you give an example (perhaps you can point it out in Listing 2?)? As a result, this section is more about classification of data using SPARQL queries, not really outlier detection? Why make it more complex than that?
- 3.3: For the classification of data, you consider SPARQL CONSTRUCT queries. Did you consider rule languages (SWRL) to encode the clear IF-THEN rules? That would seem to be the more obvious choice?
- 3.3: This section finally ends with a statement about 'data quality'. How is that related to the rule-checking and outlier detection that dominates this entire section?
- 3.4: "For those cases: an alternative is offered in the form of rules." => Can you please provide an example? Has this been tested in the evaulation use-case?
- 3.6: This interpretation step is very important. This is indicated as a potential area of further research, in which the EEPSA ontology can prove to be an invaluable resource. I somewhat doubt this. This interpretation step will likely need domain knowledge the most, which resides in the end user's mind, not in the ontology that has already been used. Well, it is not really in scope anyway, so ignore this comment as you wish.

4. EEPSA on the loop
- This section explains the entire EEPSA process again (good!). One step is only briefly explained, namely the 'preprocessing' stage. This is again the outlier detection step. The explanation of this step in Section 3.3 was already quite confusing (see above). This one paragraph in Section 4 ("The next step is ...") only adds to the confusion. Can you please expand on this step and clarify how it happens and what is useful for? An example might help. You indicate that "results obtained after applying these rules are in section 5.3". Why not just place them here?
- Towards the end of page 10, you indicate that the data collected in the data selection phase has "low quality". The reason for that is only really mentioned in section 5.3 (namely the anomalous detection of the temperature sensor). Please already mention this in Section 4, so that it is clear from the start.
- Towards the end of the section, you indicate that a 6 month time span is chosen. This looks entirely random, mainly because the actual use case and experimental setup has not been introduced yet in its entirety. This only happens in Section 5.1. It seems to me that Section 4 and 5 flow too much in each other. They both have aspects of the actual use case evaluation in Eibar; they both make claims/evaluations about the EEPSA process (cfr. comment above); they both explain systems and components used; the end of section 4 is somewhat repetitive. I recommend to restructure these two sections, separating the EEPSA on the loop process; the use case experimental setup; the evalation; and the discussion of results. Restructuring will hopefully make the entire section more concise, direct, and stronger. The experimental setup will likely have to come entirely in the beginning of Section 4.

5. Evaluation and results discussion
- At the outset of 5.3, you indicate that there is [little] room for improvement in the experiment, as the baseline is already quite good. Would it not make sense to perform the experiment for another building instead / as well? How is the test on the BEC going? How is the baseline over there? Any preliminary results available that can show the effect of your proposal?

6. Conclusions
- "First of all, data is linked to the EEPSA ontology" -> As indicated above, I think it would be more worthwhile to phrase this as "data structuring" - "semantic enrichment", rather than "linking". Furthermore, I suggest to indicate the external ontologies that you use as well, instead of only using the EEPSA ontology.
- "the proposed process is reusable in similar use cases of the same domain due to its hgih abstraction level." => You don't have any proof to support this claim, as you only document one test in Eibar. If this is so, would this approach also be usable for non-tertiary buildings?
- "it should be translated into the ifcOWL ontology" => A translation will be difficult. I would rather suggest to link direct into an ifcOWL dataset. I would go through an in-between ontology, such as the BOT ontology (see above). Other than that: this is indeed a very good suggestion for future work!
- "[The] EEPSA ontology should be completed with more detailed information to reflect the effect of materials or building envelope sealing" => I agree, but rather than extending EEPSA, it will likely be a better approach to tap in to IFC data, which has a lot of such detail available, both in terms of property sets (PSETs) and of geometry. This complicates matters fast and a lot, which is why you probably want to keep it out of the core EEPSA (and why we keep it out of the core BOT ontology).
- Out of curiosity, as this is actually outside the paper scope: the end of the conclusion calls for an intuitive GUI. This will likely require a 2D drafting or 3D modelling package, or perhaps a BIM tool? Is this the intention? Or would you opt for an online web-tool with simplified 2D modelling GUI?

Textual remarks:

The word 'besides' is very often used, typically in the beginning of the sentence. This sounds very non-academic. Please use another word for each of these. "Furthermore" is often the better alternative.

1. Introduction
- "it is necessary a system" -> "a system is needed"
- "during the operation of [a] building"
- "leading [to] the extraction of useful knowledge"
- "can be summarized as [] follows"
- "a projection of the data to a form [with which] data mining algorithms can work"
- "and models derived, [in support of decision making processes]
- "Figure1" -> "Figure 1"
- "[The] EEPSA process"
- consider copyright for Figure 1.

2. Related work[NO -S]
- "deals with the representation of [] functional and "
- "all these knowledge" -> "all this knowledge"

3. EEPSA in KDD Support
- 3.1: "data-analysts" -> "data analysts"
- 3.2: "subset[s] of variables"
- 3.2: "which [] additional knowledge [] can be extracted from the data []."
- 3.2: "in most cases, [] domain-specific knowledge is needed"
- 3.2: "and it is inferred that, apart from its indoor factors (such as temperature and humidity) might have its indoor temperature affected by the received solar radiation, as well as the elevation and direction of the sun" => something went wrong here, please reformulate.
- 3.2: "type of space is [] analysed"
- 3.2: "That is why[,] although a subset of variables is suggested, the data analyst is the one who must decide [the final data selection]."
- 3.2: "space-type" -> "space type"
- "Besides"...
- 3.2: "[The] EEPSA process uses OWL inferences"
- 3.2: "SPARQL queries [are] also provided"
- Why is there a section 3.3.1 if there is no section 3.3.2? Please remove this unnecessary section header.
- 3.3: "a very researched topic" -> "an often studied topic"
- 3.3: "which are [] essential component[s]"
- 3.3: "in which [it] takes place"
- 3.3: "Rules [have a corresponding] expression in natural language"
- 3.3: "In this stage, [the] EEPSA process"
- 3.4: "know which [alternative] data sources [are available for] that variable
- 3.4: "[The] EEPSA process"
- 3.4: "[In addition], [a] data analyst [can also] use SPARQL rules to infer"
- 3.5: "with views" -> "in his aim"?

4. EEPSA on the loop
- "from now on referred [to] as"
- "and [in which] over 200 people work"
- "in [the] Tibucon project"
- "([] when [the] workday starts)"
- "to meet [the] facility manager's requirements"
- "to the EEPSA ontology, [the] Ontop tool has been used"
- "can be queried with [the] SPARQL language while staying [available] as relational DB"
- "is inferred from [the] EEPSA ontology"
- "During [the] preprocessing phase"
- "[This] information is also collected"
- "which weather station [he] chooses"
- "that affect[s] the open space"
- "so [the] first step is to check"
- "Figure[ ]2"
- "time[ ]spane"
- "the RapidMiner Studio 7.1 version ha[s] been used"
- "[T]he Series extension [has also been used] in order to work with time series"

5. Evaluation and results discussion
- 5.1: "the Open Space[] comes from three Tibucon"
- 5.1: "the use of [the] EEPSA process"
- 5.1: "This baseline's result[]" or "[These] baseline's results"
- 5.2: "[A] baseline experiment"
- 5.2: "window-size" -> "window sizes"
- 5.2: "[the] available data pool became larger"
- 5.2: "and their window[ ]sizes [were] fine tuned"
- 5.2: "[The] most accurate model was built"
- Figure 2 should be inserted as a Table.
- 5.3: "the MAE [by] 0.16°C"
- 5.3: "Data selection suggested incorporating [variables] to the predictive model which a data analyst [not expert] in the domain may overlook"
- 5.3: "device located outdoors[] had 4,209 [anomalous] temperature observations[]."
- 5.3: "in [a] more adequate place"
- 5.3: "with the application of [the] EEPSA process"
- 5.3: "obtained and incorporated [into] the predictive model"
- 5.3: "[The] baselin model's MAE lowered"
- 5.3: "from 05:00 a.m. on[ward], even without activating [the] HVAC system"
- 5.3: "simulation is incorrect and[,] consequently[,] [the] facility manager's HVAC activation strategy [should not rely] on it."

6. Conclusions
- "[The] data analyst is guided"
- "in [the] form of new attributes"
- "should [] be translated"
- "[The] EEPSA ontology should be completed"
- "necessary knowledge regarding missing values imputation methods" -> please reformulate. Imputation methods??
- "[The] interpretation phase has a big potential"

Review #3
By Maxime Lefrançois submitted on 21/Jun/2017
Minor Revision
Review Comment:

This paper describes how Knowledge Discovery in Database (KDD) can be augmented with Semantic Web Technologies (SWT). It focuses more precisely on augmenting the KDD classical phases: Data Selection, Preprocessing, Transformation, Mining, Interpretation/Evaluation by associating data with semantics (ontology concepts and relations). This is done during an initial (or transversal) additional "Data Linking" phase during which (1) relational data is lifted to RDF conformant with relevant ontologies, and (2) new knowledge is inferred and materialized using OWL inference and SPARQL Update rules. The general approach is instantiated in the EEPSA (Energy Efficiency Prediction Semantic Assistant) process to enhance energy efficiency in tertiary building using environmental factor data measured by IoT devices and meteorological data available in external datasets. EEPSA is evaluated on a real-world setting involving data measured on one floor of the IK4-TEKNIKER building, with the goal of better predicting the temperature curves 24h in advance to better control the HVAC system control. The core enablers of EEPSA are:
1. the EEPSA ontology that imports relevant ontologies and augment them with new classes and property declarations and axioms specifically tailored to describe the building, the IoT setting, and the environmental factor data, and to infer new knowledge;
2. a semantic representation of the building and the IoT setting defined with the EEPSA knowledge model, and RDB-to-RDF mappings that enable to lift sensor data to RDF;
3. SPARQL-UPDATE inference rules that allow to generate new knowledge, including:
a. outlier detection during the KDD preprocessing phase, and
b. new variables computation during the KDD transformation phase.

The following aspects of the paper are strong points and would be reasons to recommend it for publication:

A1: The approach combines ingeniously OWL and rule-based inference with RDB-to-RDF techniques to enhance the results of data mining in the Knowledge Discovery in Database approach. It reports on substantial work that has been led, including (1) Knowledge Engineering work to design an ontology, semantic rules, and RDF lifting mappings, and (2) deployment and evaluation on a real building with data acquisition during 6 months.

A2: The related work is comprehensive and well structured, with 68 citations for a 15 pages paper, ranging from related work on KDD for energy efficiency in buildings, to Semantic Web Technologies for KDD, and also a review and description of existing ontologies in the field.

On the other hand, the paper suffers from two major problems that makes me believe more work should be done to improve it before resubmission:

R1: The English and the writing should be drastically improved. Several sentences or paragraphs are approximative English, too long, grammatically incorrect, have missing words or punctuation. In some places, the structure could be improved to limit the number of repetitions and clarify the contributions. This makes the paper very hard to read in its current state. See the bottom of this review for some parts of the paper that I think English could be improved. Some native English speaker should be asked to check the paper. Section 4 is too long and repetitive with parts of section 3 and section 5. It should be restructured

R2: Some of the important choices that have been made are not clearly stated: it is written in some places that: "we conducted a study and concluded that we would do this, or use that...". Such choices should be explained scientifically: the studies should be explained and/or scientific publications about them should be cited. The role and the way the Data Analyst interacts with the system is not clear. It is mentioned in some places in the paper that some important aspects of the solution are left for future work, but these are not summarized in the conclusion.

R3: The evaluation is globally weak, and it's hard for me to assess how an improvement of the root mean squared error in the temperature prediction of 0.16 °C would substantially enhance the energy efficiency in the building. To make the efficiency of the approach convincing, these figures should be translated into figures about energy savings and comfort preservation.

R4: In the current state of the paper, it is not possible to reproduce the results on another setting and I am left to trust what the authors state. In fact: (1) the online EEPSA ontology does not correspond to the description that is made in the paper, room is left for improvement on its quality and its publication (see below); (2) the rules that are used should be made available online for evaluation; (3) the structure of the acquired data should be described, and potentially some sample of the data should be made available (potentially anonymized); (3) the full KDD process and the algorithms that have been used in the different phases should be better described and made available; (4) the way the Data Analyst interacts with the system is not sufficiently well described.

English mistakes and some more precise comments leading to R1-R4 comments:

- indoor comfort is a must and ... -> a must have / important?
- it is necessary a system which .. -> + to have?
- a five steps process leading the extraction.. -> + to?

- takes leverage ... -> leverages?
- same paragraph -> unclear punctuation (missing commas?)
- in a real-world use-case -> on?
- Works -> Work (several places in the paper)
- BMS have generally failed -> generally fail
- Future information -> forecasted ?
- effects of humidity and solar radiation have resulted to be less significant -> rephrase, correct.

- But it has been proved that --> who? where?
- So far, it has been proved that --> who? where?

- LOD: expand acronym (first instance)
- Fig. 1 should include the additional Linking Phase

- What are the criterions to select the most relevant ontologies?
- FIEMSER datamodel is not referenced
- SAREF is not used in EEPSA? why?

p5. paragraph 2.3
- the ontology covers weather forecasting data over a time range that it is suitable to use within a smart home -> rephrase

- This previous phase -> this preliminary phase?
- ssn:Sensor, ssn:Observation do not exist in the new SSN namespace. Could be replaced by sosa:Sensor, sosa:Observation
- san:Actuator, san:Actuation could be replaced by sosa:Actuator, sosa:Actuation
- time:hasDateTime does not exist

- first paragraph of 3.2 is too long, should be rewritten and simplified
- existing works -> existing work (piece of work, uncountable)
- EEPSA ontology also describe different ... -> rewrite sentence with singular for EEPSA ontology
- has assigned -> is assigned (several places)
- how is the task of knowing which variables might be relevant for the target space semi-automatized?
- that is why -> this is why

- to the context in which takes place. -> a word is missing
- That is why each outlier class has assigned a -> check.
- provides the data analyst a set of -> with a set of

- why can Linking be transversal? what does it mean? does EEPSA do linking in a preliminary phase or in parallel to the phases? why?

- Variables [] in which actuators have an effect -> on which
- underslying semantics are not correctly -> is not
- in future stage of the research (serveral places) -> future work -> mention in the conclusion
- sometimes we have data analyst, sometimes Data Analyst, be consistent.
- the scenario in which -> on which

- acts as an office and over 200 people -> missing "where" ?
- it is time for --> rephrase (2 places)
- Other factors rather than -> other factors than
- Once the Data Analyst has decided which weather station chooses to -> he chooses to

- it is concluded that not all of these are being observed --> so what is the impact of this, the conclusion?
- what is the interface the Data Analyst uses?
- due to its low quality -> why? how? what impact?

- last paragraph of section 4 -> what study have been conducted? what are the raw results?
- footnote 13 -> what study have been conducted? what are the raw results?
- Collected data ranged --> rephrase this sentence.
- It was also proved that best results are -> what study have been conducted? what are the raw results?
- For the EEPSA-enabled model, first ... --> this is redundant with section 4!
- Evaluation seems weak

- last sentence of section 5.3: please qualify/quantify how more accurate the simulation of different scenarios were. Be precise.
-> takes leverage -> leverages

- should to be translated -> check.

- reference 18 -> check conference name.
- reference 41 -> conference, editor, book, pages?

About the EEPSA ontology and its description in the paper:
- terms and axioms in EEPSA ontology are interesting, should be better explained in the paper, and/or some of them should be written directly in the paper for illustration (ex. eepsa:NaturallyEnlightenedSpace)
- the quality of the ontology should be enhanced (ontology and term metadatas, axioms including domains and ranges when obvious in the name of the concept), datatype properties whose range is boolean could be replaced by classes,
- the import of the ObjectWithState ontology and the cpannotationschema ontology is not documented
- both the ontology and its description must be made clearer with respect to the Semantic Sensor Network ontology: the ontology inconsistently uses the old SSN ontology ( and the new SOSA/SSN ontology ( To be fair, SOSA/SSN has only be frozon early June 2016, and there probably has been substantial changes on the new version during the writing of the paper: several terms now have the SOSA prefix, Actuations and Actuators are now defined within SOSA/SSN (so SAN ontology needs not be used anymore), ssn:Device and ssn:SensingDevice have been deprecated in favour of just sosa:Sensor, sosa:Actuator, sosa:Sampler, and ssn:System. The online ontology should:
- either import the old SSN (that may shortly itself import the new SOSA/SSN so that old SSN data will become valid new SSN data),
- or be updated to the new version following the change log in SOSA/SSN spec (in which case the paragraph in the related work should be updated and a reference to the new specification should be added)

About the rules in Appendix A:
- eepsa:openSpace is not defined in the ontology online, neither is eepsa:city, eepsa:hasDeployed, etc. This makes me believe the rules have been invented during the latest phase of the writing, and are not really used in the EESPA process.
- (ssn:observes) does not exist. Use (sosa:observes). Same for ssn:observedBy -> sosa:isObservedBy, ssn:SensingDevice -> sosa:Sensor, eepsa:obsDate -> sosa:resultTime or sosa:phenomenonTime, etc.
- for better readability, please use a prefix to shorten the URIs of xsd datatypes!