Event-Based Clustering with Energy Data

KIERAN GREER and YAXIN BI,

School of Computing,

Faculty of Computing, Engineering and the Built Environment,

Ulster University, UK.

Abstract – This paper describes a stochastic clustering architecture that can be used for making predictions over

energy data. The machine learning approach uses some new algorithms of hyper and frequency grids. The

design is discrete, localised optimisations based on similarity, followed by a global aggregating layer. The

global layer is entropy-based, allowing for a comparison with the recent distributed random neural network

designs, for example. The topic relates to the IDEAS Smart Home Energy Project, where a client-side Artificial

Intelligence component can predict energy consumption for appliances. The proposed data model is essentially

a look-up table of the key energy bands that each appliance would use. Each band represents a level of

consumption by the appliance, or the amount used in a time unit and the table can replace more complicated

methods, usually constructed from probability theory, for example. Results show that the table can accurately

disaggregate a single source to a set of appliances, because each appliance has quite a unique energy footprint.

As part of predicting energy consumption, the model could possibly reduce costs by 50%, and more than that if

proposed appliance schedules are also included. A second case study considers wind power patterns, where the

grid optimises over the dataset columns in a self-similar way to the rows, allowing for some level of feature

analysis.

Key-Words: stochastic clustering, energy prediction, disaggregation.

Received: July 27, 2021. Revised: April 20, 2022. Accepted: May 16, 2022. Published: June 1, 2022.

1 Introduction

This paper describes some methods for

clustering and making predictions over energy data,

using a stochastic Hyper-Grid [10][21]. The topic

relates to the IDEAS Smart Home Energy Project

[12], where a client-side Artificial Intelligence

component can predict energy consumption for

appliances. The proposed data model is essentially

a look-up table of the key energy bands that each

appliance would use. Each band represents a level

of consumption by the appliance in a time unit.

This table can replace disaggregation from more

complicated methods like probability theory, for

example. Disaggregation is the process of

estimating how much energy each appliance would

use from a single input amount. It is helpful to be

able to estimate this, because it can make energy

predictions more accurate and it can also help to

improve the behaviour usage of energy. The energy

consumption bands can be learned by an AI model

that can then make future predictions on similar

appliance usage. Because the energy footprint for

an appliance is quite unique however, the bands

generated from it are also quite unique, which may

be enough to identify each appliance, rather than

rely on more complex methods. The energy

provider therefore, is tasked with trying to predict

how much energy will be required at a particular

time. The provider sees this as a single problem

over the whole set of input variables. The user-side

is more discrete, when there can be more than 1

independent entity, resulting in input that is more

event-based. As such, it may be more appropriate

to split the client-side model into separate parts,

each modelling one of the independent entities. A

stochastic method that can produce non-continuous

solutions may therefore have some advantages over

a functional model. A Hyper-Grid is therefore

proposed for clustering the data and can

accommodate this essential difference - not the

single provider, but discrete consumers with

unrelated events. Then to aggregate the clustering

results, the Frequency Grid [9][21] can be used.

The process is stochastic in nature, clustering

randomly, only a subset of the input data each time.

It is costly to run, but this can be configured with

smaller-sized dataset batches, when the results from

each batch can be accumulated and so it can in

theory be used with larger datasets, but over a

longer period of time. The tests firstly consider

creating a lookup table of unique energy bands to

describe each household appliance. While this

should be very quick to use in real-time, it is also

able to make the energy prediction more accurate.

The tests are then extended to consider clustering

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

197

Volume 2, 2022

feature sets in the row clusters. As such, the Hyper-

Grid can cluster rows first and then features of

those rows, using the same method. Tests on

energy data show that the method can produce

consistent and logical results and has potential for a

wide range of applications.

The rest of the paper is organised as follows:

section 2 gives some related work in the field of

energy prediction and what other methods are

typically used. Section 3 then summarises the new

clustering algorithms. Sections 4 and 5 consider the

two case studies and describe the test results.

Finally, section 6 gives some conclusions on the

work.

2 Related Work

2.1 Stochastic Clustering

Stochastic clustering is not new and in fact it

may be the preferred method for clustering

something like electric vehicle charging [22].

While this paper is interested in a different field of

energy prediction, it has to solve a similar type of

problem. In that paper, they state that coordinated

charging of EVs can bring some benefits by itself

and implementation of this scenario without

coordinated charging can impose a huge amount of

excess load on the grid. They give a description of

stochastic scheduling, where they state that ‘since

there are a lot of uncertainties in a real-world

situation and specifically in EV energy scheduling

problems, such as EVs driving pattern, diverse

temporal and spatial EV charging pattern and so on,

in the literature in this field, deterministic

scheduling has been implemented much less than

stochastic one. A stochastic model has one or more

stochastic elements. The system having stochastic

element is generally not solved analytically and,

moreover, there are several cases for which it is

difficult to build an intuitive perspective.’

To make the Hyper-Grid practical, it is possible

to use a form of bootstrapping over randomised

datasets. Bootstrapping [8] removes some rows

from the dataset each time and solves the problem

for the remaining rows. It then averages over the

final solutions. The hyper-grid by nature, solves

over localised parts of the whole problem and so

this is always part of constructing any solution. It

would therefore be natural to include a

bootstrapping process with the hyper-grid, which

would allow it to use subsets of the whole dataset

each time. It would also help a lot in practical

terms, to reduce the size of the data grid, which is

very time-costly to solve. This method is also the

one proposed in [4], that suggests protecting linear

classifiers from adversarial attack, through the use

of bagging and random subspaces. Aggregating the

subset results from the hyper-grid is therefore a

very similar idea. The architecture may also have

similarities with neural networks. Deep Learning

neural networks [6], for example, learn discrete

elements before summarising them through a

pooling layer. The stochastic and distributed nature

however may have more in common with random

neural network architectures [20]. Likelihood

estimators have been used to predict energy usage

before. One new version associated with

Generative modelling (GANs) and variational

autoencoders (VAEs) is described in [19]. Part of

their intuition states:

‘Since likelihood is the product of densities

evaluated at all data examples, the model density at

each data example should be high. Suppose we

don’t observe the model distribution directly, and

instead only observe independent and identically

distributed (i.i.d.) samples drawn from the model.

Because the density at data examples is high, more

samples are expected to lie near data examples than

elsewhere.’

It then defines a series of likelihood estimates

for random variables that contain distances between

x and the nearest sample and uses the minimum

solution as the best one. Another paper [5] uses

regression models to predict the appliance energy

usage in a house. It notes that the larger white

goods appliances (fridge, cooker, clothes washer,

freezer) consume the most energy and appears to

suggest that their best regression model can predict

the energy consumption to 57% accuracy. If that is

the case, then the results of the case study in section

4 are not too bad. Another paper that uses genetic

algorithms to predict energy usage [15], quotes

29% saving or 36% during peak time.

2.2 Energy Systems

Predicting the household energy use at the level

of applications is quite a popular topic [1]. Modern

systems are quite inefficient and so there is a

potential for a huge amount of saving. Because it is

difficult to measure each application in real time

and it is also an intrusive and possibly expensive

process, the current trend is to disaggregate the

total power input to each appliance, to estimate

their use from a behaviour model, generated from

AI. This is typically done by first using raw data to

train an AI model of the appliances and then using

that to estimate the disaggregated values for each

appliance. The training phase would typically log

on-off switching events for each appliance in the

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

198

Volume 2, 2022

house and then train possibly a Hidden Markov

Model to recognise the hidden or unobserved

states, which are the individual appliances, from the

observed state, which would be the total power

usage per unit time [2][3][16][18]. After training,

the internal measurements do not need to be made,

but can be estimated by the model. While this

system is shown to work well, it is proposed in this

project that a much more direct approach could be

just as effective. For one thing, the Markov Model

is state-based and time-related, where switching

events are placed in order. The new method to be

proposed is set-based, where it will suggest an

amount of energy that may get used during the day,

but not the exact order in which it gets used. The

set-based approach has been looked at previously,

where gaussian equations are used as part of the

model and the Markov process [3] can be an

extension of a Bayesian Network [17] for example.

But these calculations are also expensive and may

also include other probability measures, such as

Expectation-Maximisation or Likelihood. The

method in this project will make use of the idea that

most appliances have quite a unique footprint, in

terms of their unit energy use, and so a large lookup

table may be sufficient to allow disaggregated

appliance usage to be recognised from a single

input power measurement. More recently, deep

learning and its variations have been used. The

survey paper [14] notes that some of these methods

can be computationally expensive.

3 Summary of the Clustering

Algorithms

3.1 Hyper-Grid

The Hyper-Grid [10][21] creates solutions by

matching data rows that are closer together

depending on some measurement, where one idea

is to evolve the matched pairs further, as in Genetic

Algorithms [11]. For this project however, it forms

the basis for some clustering phases that result in

energy bands, which can define the appliance usage

over the course of a day. The heuristic works as

follows: It reads a dataset of values, randomises the

row ordering and also keeps a record of the original

ordering for identification purposes. For the

problem of learning the appliance behaviour, the

dataset is a single column of values, representing

the appliance energy consumption every time unit,

but the dataset could have any number of columns.

Randomising the row ordering means that there

cannot be any bias in the ordering during the row

matching process. The heuristic then compares the

data rows and notes which pairs are most similar,

according to some metric, such as the Manhattan

distance. So that the heuristic does not result simply

in hill-climbing, a matching process is preferred to

one that selects the largest scores only. Also, if two

rows are selected, any rows between them are

removed from the solution, which means that it also

discriminates. All of the potential matches are

saved and then sets of row pairs are selected that

would optimise the total score. This optimisation

gives the heuristic some direction and helps to

ensure a better result. The algorithm is illustrated in

Figure 1, where the process would be as follows:

Row Index

Column

Figure 1. Example matching process by the Hyper-Heuristic.

1) Compare every row with every other row and

save a difference score based on the Euclidean

difference between the cell values.

2) The rows that are closest to each other are:

a) 1 and 7 with difference 4

b) 2 and 7 with difference 3,

c) 3 and 5 with difference 1,

d) 4 and 6 with difference 1.

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

199

Volume 2, 2022

3) The sum score for rows 1 and 7 is 18 and for

rows 2 and 7, it is 15. But rows 2 and 7 are

more similar and so they are preferred.

a) The rows in-between can be removed, but

if they also have matches, then the matches

can be added first.

4) The similarity scores for rows 3 and 5, or 4 and

6 are the same with a score of 1. The sum score

for rows 4 and 6 is larger however, with a value

of 27 and so rows 4 and 6 are preferred.

Because this cuts across rows 3 and 5, the

matching pair of 3 and 5 is not included in the

solution – row 5 is removed from the solution.

5) This leads to a final solution with the row pairs

2 and 7, and 4 and 6.

6) These row pairs are then saved to a list that can

store all rows pairs for all of the test runs.

7) The final list of row pairs are then fed through

the frequency grid that clusters them into mini-

clusters for similar frequency counts.

8) The mini-clusters can then determine the

energy bands, for example.

For the appliance problem and again, to prevent

hill-climbing from the unique footprint, row

matches can be treated as equal if the difference

falls inside of a particular value or band and is not

only the smallest difference possible. Therefore, if

a band similarity value is 2 and one matching score

is 0 and another is 2, then they would both return a

band value of 1. The use of bands or score ranges is

interesting and might also work with other

algorithms.

Due to combinatorial explosion, a complete

search over a larger dataset is not possible and so

the algorithm has to split a dataset up into smaller-

sized parts and solve the problem on each part

separately. Because the rows are randomised first,

this can still give a reliable result. For example, if

Figure 1 is a subset of the whole dataset and there

are another 7 rows that have been clustered during

a different batch run, the row pairs from the other

subset can be added to the row pairs from Figure 1

and the combined list can be clustered using the

frequency grid.

3.2 Frequency Grid

The Hyper-Grid therefore also requires an

aggregating layer, which can be the Frequency Grid

[9][21], for example. This reads the list of row pairs

and produces sets of count values that represent

which rows are more often paired together. It is

more entropy-based than local counts however,

where the aggregation from the frequency grid can

produce a holistic view of the row pairs and

produce clusters for the whole dataset. The

following description is taken from [9]. Consider a

different set of data, but again for 7 rows, shown in

Figure 2.

Figure 2. Frequency counts would group rows 1-4 and rows 5-7 together.

It is clear from the data that rows 1-4 all

reinforce each other (pattern 1), as do rows 5-7

(pattern 2). With a grid format, the input is

represented by a single pattern group, where a

count is incremented for each row pair occurrence.

The grid format lists each variable, or in this case it

would be a row number, both as a row and a

column. Each time a pattern is presented, the

related cell value for both the row and the column

is incremented by 1. In row 1, for example, the

counts suggest that it should be clustered with rows

2-4, because they have higher counts with row 1.

The same conclusion can be made for rows 5-7. It

is probably not necessary to update a self-reference

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

200

Volume 2, 2022

in the grid, so the leading diagonal can be empty.

The grid result can then be read using the following

algorithm:

1) Each row displays count values representing a

key variable - the row name, and its relation to

the other variables.

2) All cells relating to variables in the input

pattern are updated each time.

a) Each row that starts with one of the

variables updates the count for every other

variable in the input pattern.

b) Because the variable is repeated in several

cells, this still leads to normal count values

for each cell.

3) To determine the best clusters then:

a) For a key variable (row key value) scan

across and select the other variables with

the largest count values. That variable then

considers those other variables to be part of

its cluster.

b) The other variables however may be more

associated with a different cluster, so their

rows can be checked for consistency. They

should similarly have a largest count value

for the other variables in the cluster. If any

have different (larger or smaller) count

values, then they probably belong to a

different cluster.

3.3 Feature Selection

A second use for the algorithm is suggested that

is a self-similar process of matching over the

columns instead of the data rows. This could work

as follows and is the basis for the second case

study: The algorithm is run on the dataset to

produce sets of mini-clusters. These are subsets of

larger clusters that are realised from subsets of the

whole dataset. Then for each mini-cluster, the rows

and columns are transposed and the algorithm is

run again on that subset only, to produce another

set of mini-clusters that would define feature sets

instead. Some of the mini-clusters may share some

of the feature sets, when they can be combined

further, depending on some threshold criterion.

This process therefore also provides a basis for

analysing the column or feature sets in the data,

which is described more in section 5.

4 Case Study 1: Disaggregating

Energy to Appliances

This was the main focus of the research on the

IDEAS project [12][13] and it is a well-known

problem of trying to predict how much energy is

required, by measuring the energy consumption of

a set of household appliances and using that model

to make the prediction. If individual appliances can

be measured, then that can improve the accuracy of

the prediction and also help with the human

economic behaviour. A survey of this topic [1]

notes that low-cost energy reductions can be made

in the residential and commercial sectors, but these

savings have not been achievable to date. Also, that

billions of dollars are being spent to install smart

meters, yet the energy saving and financial benefits

of this infrastructure – without careful

consideration of the human element – will not

reach its full potential. Of particular interest here

may the scheduling of energy use [13], proposed as

part of the Ideas model. Even saving just a few

percent of energy consumption is worthwhile.

The single input power source needs to be

disaggregated to each of the appliances, or an

estimate of how much would go to each appliance

needs to be made. This problem therefore requires a

training stage to learn the model and then a testing

stage to match that with the input source. The

proposed algorithm ran a number of bootstrapped

tests on raw data and generated row pairs that

resulted in sets of mini-clusters. As the clustering

involves a similarity measure, a more obvious

approach is to aggregate the raw data into the time

unit, hours for example, and then count the number

of occurrences of each aggregated value. If this is

done however, there is too much variability in the

aggregated values. They do not conform to a set of

values and so it is not possible to produce an

aggregated view. This may be because the

behaviour of any appliance is unpredictable and so

aggregated values will typically be different to each

other. Therefore, some form of clustering is

required to recognise patterns or structure in the

data and the hyper-grid was selected for this

project.

4.1 Second Clustering phase

The first clustering phase therefore selected time

slots during the day that were similar. A second

clustering stage then took the full list of mini time-

clusters and retrieved the energy values for each

row in a cluster and placed those in an energy

cluster for the time cluster. The values in the

energy clusters might then overlap, where this

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

201

Volume 2, 2022

would result in an energy band with an upper and a

lower limit, although, single values were more

typical. These energy bands or single-values, could

then be used to determine the user behaviour of the

system per time unit. For example, if there were 3

events of band A and 1 event of band B, the energy

supplier could expect band A, 3 times more often.

After band B occurred, it could expect only band A

until it had occurred 3 times, and so on. The bands

are most useful for reducing the complexity of the

system and recognising some inherent structure or

behaviour. If there is a range, then the upper limit

would typically have to be accommodated for by

the system. But it is the fact that there is now some

sort of model that can describe the application

behaviour in a tractable way that makes it useful.

4.2 Third Clustering Phase

A third clustering phase is also likely, not when

training the data model, but when reading it into the

energy network, to be used by the system during

runtime. This would help to reduce the complexity

even further, so that a lookup table can be

generated for an exhaustive combination search.

While the values varied, it could be by a small

fractional amount only and so this clustering

created bands from values with the same ‘integer’

upper and lower-limit parts, resulting in only one

energy band for the whole fractional range. See

Table 1 in section 4.4.1, for an example.

4.3 Optimisation

The intention is to optimise some process that is

part of an energy system. As described earlier in

this section, the energy bands can be used to predict

how much energy the appliances are likely to use.

The system can therefore plan for the calculated

maximum energy requirement at any one time and

then when energy band events occur, they can be

removed and the prediction can be adjusted to the

remaining set. It is always important that there is

enough energy for any eventuality and so there

always needs to be additional energy in the system

that may not get used. This is one place where

energy savings can be made, if predictions can

minimise this additional amount. Other methods

may typically recognise on-off switching events for

appliances and then generate Markov Models [2][3]

or other probability measures, such as Likelihood

evaluations [19] and with that research,

disaggregation is helpful. The model would be

trained to recognise on-off switching events for

appliances, which gives a state-based or a

probability model for each appliance. Then from a

single input value, the system would disaggregate

the input power source to each of the appliances, by

learning when the appliances are likely to be on or

off. A Markov Model can be used with time-based

events, or a Likelihood probability estimate can be

used with set-based events, for example. The

method of this paper does not have sequential

events, but is more set-based. It only defines that

this set of events may have occurred in a particular

time period. This is both good and bad, because it

may be possible to produce a simpler alternative to

other probability measures. On the down-side, it

would still benefit from the accuracy of learning

some amount of timing and so the future work for

this section describes how that might be achieved.

4.4 Test Program and Results

An algorithm has been implemented in the Java

programming language and tested on real data [7]

from households in Switzerland. This data has been

used before to generate Markov Models, for

example [2][3]. A number of houses were

monitored, where a smart plug was able to measure

the energy consumption for an appliance. This was

relayed to a central system and logged every

second. Therefore, each appliance was logged

every second for a period of approximately 8

months. For the hyper-grid, a measurement of

every second was too fine-grained and so the data

was converted into aggregated values for each

hour. That is the average amount of energy used by

the appliance each hour. The data for each

appliance was then clustered using the Hyper and

Frequency Grids, as described earlier in this

section. The final set of energy bands would be

used to define the behaviour of the appliance over

the course of a day. For the 8-month period, there

was no large change in the appliance behaviour

from one month to the next, but this could certainly

be modelled as individual and seasonal sets of

bands.

The energy system was then able to guess how

much energy each house was likely to require

during the day. It would have to provide the upper

limit each time, so that the house does not run out

of energy. This can be achieved by returning the

largest energy band for each appliance in the house

each time. This would be an upper limit for the test

process, where the system would measure the

prediction accuracy by then removing an energy

band instance, at random, for each appliance. The

next prediction would then be calculated on the

remaining energy bands and the accuracy would be

the difference between the estimate and the

randomly selected set of used bands. The test

system therefore did not model exactly when an

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

202

Volume 2, 2022

appliance was used, but knowing when, would have

made it more accurate.

4.4.1 Lookup Table

It is proposed that a simple lookup table can be

used to good effect. The energy bands reduce the

combinatorial complexity enough to suggest that

for all appliances in a household, a lookup table of

under 1 million entries could be sufficient. The

table needs to map every energy band events for

each appliance with every other one and so it has to

provide a combination for every possibility. There

could of course be statistical methods to reduce the

number further, when the entries are very similar,

such as the third clustering phase of section 4.2. For

example, Table 1 is a set of energy bands that were

produced for a fridge in some household over the

course of a day. Each band represents an hour of

energy consumption by the appliance and the

frequency is how many times that occurred during

the day. Most bands contained a single value that

was the result of the first two phases of clustering.

In rarer occasions, for example: 31.0813 - 28.1024,

there was an overlap in the clusters leading to a

value range. When reading these into the model

however, the lower-values bands with the same

integer number can be combined. For example, all

bands starting with the integer value of ‘1’ can be

combined.

Energy Band

Freq per

Day

Energy Band

Freq per

Day

51.3304 -

51.3304

12.3664 -

12.3664

47.3313 -

47.3313

8.2406 -

7.89404

41.8416 -

41.8416

6.6794 -

6.6794

37.4993 -

37.4993

4.0358 -

4.0358

35.5428 -

35.5428

3.8319 -

3.8319

31.0813 -

28.1024

3.2404 -

3.2404

21.3339 -

21.3339

1.3989 -

1.3989

18.3959 -

18.3959

1.2865 -

1.2865

16.317 -

16.317

1.2065 -

1.2065

14.8694 -

14.8694

Table 1. Example of energy bands for an appliance, with power consumed per hour. Set of energy bands that

occurred over the course of a single day for a Fridge

Then with each combination of all appliances, a

power input total can be calculated by taking the

upper limit or average of each energy band in the

combination. The lookup table would be produced

for each appliance during a short training phase,

when the system is being setup. After that, the

system would read the power consumption at some

time unit and pass that to the table, which would

return the appliance combination that matches

closest to the power value. Because the band values

are quite unique, the combination value can be a

key to a table, where the table value is then the set

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

203

Volume 2, 2022

of appliances that created it. The selected band

events could then be removed from the day’s set,

allowing the next consumption amount to be more

accurately predicted.

This method should work reasonably well

because of the unique energy footprints, but there is

also quite a wide margin of error that would be

acceptable. For example, if the system has a set that

contains energy bands of ‘5 units’ for both

appliances A and B. Then maybe the next event

reads an energy band of 5 and the system

mistakenly attributes that to appliance B instead of

A. For the following time unit, the system would

then expect appliance A to produce a 5-unit band

instead of B. But if B produces the 5-unit band in

the next time unit instead, this does not in fact harm

operation of the system. The system only needs to

match with the energy requirement, it does not need

to know exactly, which appliance any band came

from. Although, the authors recognise that for a

more sophisticated system, individual appliance

usage may need to be monitored and may prove

more problematic.

4.4.2 Test Results

The results show potential for improvement, but

the path to that improvement is outlined in section

4.5. After the energy band clusters were produced,

an energy network and disaggregator were created

from them and a predictor was asked to simulate

the network activity. It would return its maximum

requirement for energy each time and then remove

a random set of energy bands as the actual event.

The maximum requirement could in fact be

summed and the total then passed through the

disaggregator, to return an estimate for it instead.

This produced only a small reduction in the

accuracy overall. For 3 locations of the datasets [7],

the following result of Table 2, was achieved.

Total power consumed was: 55295

Difference between predicted and disaggregated was: 6.0E-12, or 1.0E-

14 %

Difference between predicted and used was: 44599, or 80.5 %

Number of predictions less than actual was: 0

Saving for location: h1 was 14414 or 54 %, from 24 events.

Saving for location: h2 was 13330.5 or 28.5 %, from 24 events.

Saving for location: h4 was 2222.5 or 18.5 %, from 24 events.

Table 2. Prediction accuracy for locations House 1, 2 and 4.

The first figures indicate that the accuracy of the

prediction to the actual random events is only about

20% accurate, or there is an 80% power loss when

predicting how much energy should be provided.

But this is for random selections that have to

accommodate the spiking events. Disaggregating

the single input source to the appliance lookup

table however is very accurate and the error may be

down to the computer processing floating point

numbers. This gives support to the idea of unique

energy bands for the appliances. Then, the

difference in providing the upper bound on the

energy bands each time and the predicted amount,

over the course of a day, is shown in the second set

of figures. It could be around 50% savings, but

house 4 is less at only 19% savings. A key concern

is to guess when the spiking event might occur.

This is where an analysis that includes time would

be helpful. A calculation using only the upper

bound would have to accommodate this for every

hour, while the prediction can remove it as soon as

it occurs, which on average might be half-way

through the day, for example. But the energy bands

themselves are a unique solution that make the

whole problem very tractable.

4.5 Future Work

Tests suggest that the system can save maybe

50% of energy production from providing the

maximum amount each time, but that it is still

relatively inaccurate at predicting the random

events. Until the spiking event has occurred, the

system has to accommodate it and so it has to

provide that level of cover, when the appliance is

only using a small amount of energy. This is

therefore where a set-based approach is much less

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

204

Volume 2, 2022

effective than one with event sequences. The

IDEAS [13] system can also provide appliance

schedules however, entered manually by the user,

that can describe in general terms, when an

appliance is likely to be used. These schedules can

therefore describe time-based events that could be

used as part of the prediction. For example, if the

schedule indicates high usage in the morning and

low usage in the afternoon, then this can be added

to the prediction. A second area of interest is the

energy sources and their power input. This is

disaggregated as part of the system operation, but

the power input is also often modelled as part of a

time series. The time series would then produce

timed events that could also be used to adjust the

prediction.

5 Case Study 2: Germany Wind

Power Regions

A second test scenario looked at clustering the

Wind Power datasets from Germany [23]. This data

was selected because it is relatively small in size

and does not require formatting. There are 4

datasets, one for each of the four wind power

companies - 50Hertz, Amprion, TennetTSO and

TransnetBW. Each row in the dataset represents 1

day, over a 12-month period, resulting in 397 rows.

Each column then represents the power generated

each 15 minutes of the day, resulting in 96

columns. The analysis differences were quite small,

but they may represent some kind of pattern. Also,

and importantly, they may help to confirm that the

hyper-grid is able to produce consistent results.

Two sets of tests were run. The first was for 200

iterations over the dataset batch subsets and the

second was for 500 iterations over the same-sized

batch subsets. For example, if the batch size was

100 rows, then there would be 4 batch tests for

every iteration. The first test resulted in mini-

clusters that covered only about 10% of the whole

dataset, while the second test covered about 50% of

the data rows. It might be expected that a smaller

number of rows would select slightly larger values,

because of the optimising feature in the algorithm.

If the data is represented by some distribution, for

example, then the smaller number of rows may be

more from the top of that distribution. Then as

more rows are selected, they will include rows from

lower down the distribution, or with smaller values.

This actually means that overtraining of the hyper-

grid would probably result in the clusters joining

with each other and losing some of the

discrimination effect.

Therefore, some configuration was required, to

select an appropriate size of batch subset, a

similarity percentage for row matches, and so on,

which was a bit of a black art. A best configuration

would also depend on the data itself, as to how

many rows would eventually be included. When

the rows had been clustered, each mini-cluster was

analysed further to match its columns or features,

and in this case, bands were not used, when the

closest match value was exact instead. The features

were not studied in detail, but only the variance

was considered, which is OK, because they

represented time units only. Row clusters that

shared a certain number of similar feature clusters

were then combined again afterwards, during the

analysis stage, described next.

5.1 Dataset Analysis

The test program therefore, produced sets of

mini-clusters, or row indexes that should be

clustered together and for each mini-cluster it did

the same for the columns. What might be

interesting was the fact that the column clustering

preferred to cluster a column with other columns

that were close to it and with the more extensive

test, so did the rows. There was therefore a type of

preferred sequential matching that took place, or

the energy value for one time unit was typically

closer to the immediately preceding or subsequent

time units. To study this further, each mini-cluster

was analysed across the row and column indexes,

to measure the variability size from one index to

the next. For example, if rows or columns 1, 4, 8,

10 were clustered together, then this would produce

a variability of 3+4+2 = 9 / 3 = 3. This variability

value was measured for the 4 companies separately

and is shown in Table 3.

10% Coverage

50% Coverage

Compan

Row

Var

Col

Var

Val

Row

Var

Col

Var

Val

50Hertz

19.1

5.3

0.6

3.5

4.9

0.2

Amprion

28.8

3.7

0.7

4.4

3.5

0.2

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

205

Volume 2, 2022

TennetT

24.7

8.8

0.7

3.1

4.8

0.2

Transnet

37.3

4.4

0.6

4.3

4.7

(14)

0.2

Table 3. Variances for rows and features, for the Germany Wind Power stations.

The results for 50% coverage are with an

adjustment for the TransnetBW dataset. If all

columns were included, then column 2 would be

clustered with columns much further on and this

would lead to an anomaly in the sequence

variability. If that single column index was ignored

in the calculation, then the variances were all much

closer. The differences therefore are quite small,

but it looks like TransnetBW has the most

variability across both rows (seasonal weather) and

columns (daily weather). If the anomaly is included

however, then the difference is clear and it is still a

feature in the TransnetBW dataset that is not in the

other ones. So there may be a question about a

mountainous terrain affecting the performance

there and at what energy range occurs the most.

The other 3 companies appear to be a lot more

similar. Amprion performs the best and also has the

best day (column) variance values.

6 Conclusions

This paper suggests an architecture that includes

both discrete and centralising elements and

therefore has some similarities with neural network

architectures. A stochastic and discrete layer

clusters randomly selected subspaces of the data

into mini-clusters, not using gradient descent as in

neural networks, but using a Euclidean distance

linear classifier, as in [4] or the random networks

[20]. Then, an aggregating layer combines the

discrete results, rather like a deep learning pooling

layer [6]. The algorithm can be trained to recognise

similarities across data rows, or data columns

(features), in a self-similar way. An optimising

feature means that the algorithm prefers to cluster

similar rows with larger values first. One might

think about an energy surface with peaks and

troughs, for example, but the surface is being

traversed in many different places at the same time.

Overtraining might then be recognised when the

localised peak distributions start to merge with each

other, which can happen when more lower-valued

rows are linked with the higher-valued ones. In this

paper, the architecture is used to cluster and make

predictions over energy data, using a stochastic

Hyper-Grid [10] and a Frequency Grid [9]. The

discrete, localised optimisations in the hyper-grid

match dataset rows that are more similar, using a

distance measurement, but is also able to

discriminate and keep only the row sets that will

optimise for some overall total. This method is also

described in [4], in terms of bagging and random

subspaces, as being a possibility to protect from

adversarial attack, by keeping some of the data

always hidden.

The case study topic relates to the IDEAS Smart

Home Energy Project, where a client-side Artificial

Intelligence component can predict energy

consumption for appliances. The proposed data

model for that is essentially a look-up table of the

key energy bands that each appliance would use.

Each band represents a level of consumption by the

appliance. This table can replace disaggregation

from more complicated methods constructed from

probability theory, for example. Results show that

the table can accurately disaggregate a single

source to a set of appliances, because each

appliance has quite a unique energy footprint. As

part of predicting energy consumption, the model

could possibly reduce costs by 50% and more than

that if the proposed schedules are also included

[13]. The hyper-grid has been changed to consider

rows as single units, making it more tractable. A

second case study considers wind power patterns,

where the grid optimises over the dataset columns

that represent a time-series, showing some level of

sequential consistency over this feature analysis.

Acknowledgement

This work is supported by the project of "Novel

Building Integration Designs for Increased

Efficiencies in Advanced Climatically Tuneable

Renewable Energy Systems (IDEAS)" (Grant ID:

815271), which is funded by the EU Horizon 2020

programme.

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

206

Volume 2, 2022

References:

[1] Armel, K.C., Gupta, A., Shrimali, G. and Albert, A.

(2013). Is disaggregation the holy grail of energy

efficiency? The case of electricity. Energy policy,

52, pp.213-234.

[2] Baranski, M. and Voss, J. (2004). Genetic algorithm

for pattern detection in NIALM systems. In 2004

IEEE International Conference on Systems, Man

and Cybernetics (IEEE Cat. No. 04CH37583) (Vol.

4, pp. 3462-3468). IEEE.

[3] Beckel, C., Kleiminger, W., Cicchetti, R., Staake, T.

and Santini, S. (2014). The ECO data set and the

performance of non-intrusive load monitoring

algorithms. In Proceedings of the 1st ACM

conference on embedded systems for energy-

efficient buildings, pp. 80-89.

[4] Biggio, B., Fumera, G. and Roli, F. (2010). Multiple

classifier systems for robust classifier design in

adversarial environments, Int. J. Mach. Learn. &

Cyber., Vol. 1, pp. 27 – 41. DOI 10.1007/s13042-

010-0007-7.

[5] Candanedo., L.M., Feldheim, V. and Deramaix, D.

(2017). Data driven prediction models of energy use

of appliances in alow-energy house, Energy and

Buildings, Vol. 140, pp. 81 – 97.

http://dx.doi.org/10.1016/j.enbuild.2017.01.083.

[6] Chen, Y., Sun, X. and Jin, Y., 2019.

Communication-efficient federated deep learning

with layerwise asynchronous model update and

temporally weighted aggregation. IEEE transactions

on neural networks and learning systems, 31(10),

pp.4229-4238.

[7] ECO Dataset. (2022).

http://vs.inf.ethz.ch/res/show.html?what=eco-data.

[8] Efron, B. and Tibshirani, R. (1993). An Introduction

to the Bootstrap. Boca Raton, FL: Chapman &

Hall/CRC.

[9] Greer, K. (2019). New Ideas for Brain Modelling 3,

Cognitive Systems Research, Vol. 55, pp. 1-13,

Elsevier. DOI:

https://doi.org/10.1016/j.cogsys.2018.12.016.

[10] Greer, K. (2012). A Stochastic Hyper-Heuristic for

Matching Partial Information, Advances in

Artificial Intelligence, Vol. 2012, Article ID

790485, 8 pages. doi:10.1155/2012/790485,

Hindawi.

[11] Holland, J.H. (1984). Genetic algorithms and

adaptation. In Adaptive Control of Ill-Defined

Systems (pp. 317-333). Springer, Boston, MA.

[12] IDEAS. (2022). Novel building Integration Designs

for increased Efficiencies in Advanced climatically

tunable renewable energy Systems,

https://www.horizon2020ideas.eu/.

[13] IDEAS, Deliverable 4.4. (2021). Smart Mobile App

Development,

https://www.horizon2020ideas.eu/publications-

papers-downloads-videos-about-ideas/.

[14] Jiang, J., Kong, Q., Plumbley, M.D., Gilbert, N.,

Hoogendoorn, M. and Roijers, D.M. (2021). Deep

Learning-Based Energy Disaggregation and On/Off

Detection of Household Appliances. ACM

Transactions on Knowledge Discovery from Data

(TKDD), 15(3), pp.1-21.

[15] Jiang, X., Xiao, C. and Sun, J. (2019) Household

Energy Demand Management Strategy Based on

Operating Power by Genetic Algorithm, IEEE

Access, Vol. 7, pp. 96414 - 96423. DOI

10.1109/ACCESS.2019.2928374.

[16] Kephart, J.O. and Chess, D.M. (2003). The vision of

autonomic computing. Computer, 36(1), pp.41-50.

[17] Kim, H., Marwah, M., Arlitt, M., Lyon, G. and Han,

J. (2011). Unsupervised disaggregation of low

frequency power measurements. In Proceedings of

the 2011 SIAM international conference on data

mining (pp. 747-758). Society for Industrial and

Applied Mathematics.

[18] Kolter, J.Z. and Jaakkola, T. (2012). Approximate

inference in additive factorial hmms with

application to energy disaggregation. In Artificial

intelligence and statistics (pp. 1472-1482). PMLR.

[19] Li, K. and Malik, J. (2018). Implicit Maximum

Likelihood Estimation, arXiv:1809.09087v2

[cs.LG] 22 Oct 2018.

[20] Liang, X., Javid, A.M., Skoglund, M. and

Chatterjee, S. (2022). Decentralized learning of

randomization-based neural networks with

centralized equivalence, Applied Soft Computing,

Vol. 115, ISSN 1568-4946,

https://doi.org/10.1016/j.asoc.2021.108030.

[21] Licas. https://licas.sourceforge.io/.

[22] Noorollahi, Y., Aligholian, A. and Golshanfard, A.

(2019). Stochastic energy modeling with

consideration of electrical vehicles and renewable

energy resources - A review, Journal of Energy

Management and Technology (JEMT) Vol. 4, Issue

1, pp. 13 - 25.

[23] Wind Power Generation Data. (2022).

https://www.kaggle.com/jorgesandoval/wind-

power-generation.

Creative Commons Attribution License 4.0

(Attribution 4.0 International, CC BY 4.0)

This article is published under the terms of the Creative

Commons Attribution License 4.0

https://creativecommons.org/licenses/by/4.0/deed.en_US

DESIGN, CONSTRUCTION, MAINTENANCE

DOI: 10.37394/232022.2022.2.26

Kieran Greer, Yaxin Bi

E-ISSN: 2732-9984

207

Volume 2, 2022