Iranian (Iranica) Journal of Energy & Environment Deep Learning Based Electricity Demand Forecasting in Different Domains

Electricity demand forecasting is an important task in power grids. Most of researches on electrical load forecasting have been done in the time domain. But, the electrical time series has a non-stationary inherence that makes hard load prediction. Moreover, valuable information is hidden in the electrical load sequence which is not open in the time domain. To deal with these difficulties, a new electricity demand forecasting framework is proposed in this work. In the proposed framework, at first, a new feature space of electrical load sequence is composed. The provided domain involves complementary information about shape and variations of electrical load sequence. Then, the obtained load features are integrated with the original load values in time domain to allow a rich input for predictor. Finally, a powerful deep learning technique from the family of recurrent neural networks, named long-short term memory, is used to learn electricity demand from the provided features in single and hybrid domains. The following domains are investigated in this work: frequency, cepstrum, spectral centroid, spectral roll-off, spectral flux, energy, time difference, frequency difference, Gabor and collaborative representation. The experiments show that the use of time difference domain decreases the mean absolute percent error from 0.0332 to 0.0056.


INTRODUCTION 1
An accurate electrical load prediction is necessary to build an intelligent energy management system, adjust and monitor energy demand and supply. It plays an important and crucial role in the present and future energy market [1,2]. All of forecasting types, grouped based on time intervals, such as short-term, medium-term and long-term are serious subjects for planning and operation of electricity industry [3][4][5][6]. Shortterm load forecasting (STLF), which is focused in this work, refers to load prediction from several minutes or several hours to several days or a week. Energy providers and utilities need STLF to specify the accurate required amount of electrical energy to purchase. In this way, they can buy electricity with lower prices. Advanced metering infrastructure assemble much more information by providing smart meter data than the data provided by traditional meters. This information allows a potential for an accurate STLF.
Analysis of the load data can be done through several main approaches: deterministic, statistics and artificial intelligence. Almost in all methods, the load sequence is considered as a signal or time series. The deterministic approaches model the relation between consumed (or demand) load with other related factors like weather conditions such as temperature. They do forecasting using curve fitting, smoothing methods *Corresponding Author Email: maryam.imani@modares.ac.ir (M. Imani) and data extrapolation [7][8][9]. The statistical approaches consider the load time series as a stochastic process. They modelled the load curve of customers in different conditions by using probabilistic approaches such as Bayesian framework [10], regression methods such as autoregressive integrated moving average (ARIMA) [11,12], support vector regression [13,14] and Kalman filtering [15]. The artificial intelligence approaches are divided into two main groups: expert systems such as fuzzy based decision makers [16,17] and artificial neural network (ANNs) [18][19][20]. The expert systems utilize a knowledge base provided by experts of electricity industry and inference engines constructed based on fuzzy logic. ANNs are known as powerful tools for load forecasting. They are inspired by biological structure of human brain that have several excellent characteristics. They can extract a non-linear model of observations without any assumption about the statistical distribution of data. So, ANNs have capability of dealing with complex patterns opposed to other traditional methods such as ARIMA. ANNs are self-adaptive and data-driven where an appropriate model of available samples is formed adaptively based on the observed data.
The multilayer perceptron with a single hidden layer as a feed foreward neural network (FFNN) has been widely used for load forecasting [21]. But, the main disadvantage of FFNN is that it only exploits the current samples of data without considering the previous input samples. In other words, it has not memory to remember anything happened in the past. But, in the sequential data such as load sequences, there are samples related to each other located in time order. To deal with these sequences, recurrent neural networks (RRNs) have been introduced where they consider the input samples previously received together with the current input ones [22]. An extended version of RNNs is long-short term memory (LSTM) network [23,24]. LSTM has longer memory than RNN such that it is appropriate to learn from input samples and what experienced from past time with very long memory.
The load data, as a temporal sequence or time series, contains worthful information about consumption behavior of customers in successive time intervals. This historical data in the time domain has been used for load forecasting in most of introduced STLF methods. But, smart meter data may contain useful features that are not evident in the time domain. Some informative features can be revealed in other feature spaces. To assess this subject, the performance of STLF in time domain is assessed and compared to other domains or feature spaces. LSTM is used for load forecasting in various domains in this work. The assessed domains are time, frequency, cepstrum, spectral centroid, spectral roll-off, spectral flux, energy, time difference, frequency difference, Gabor and collaborative representation (CR). In addition, the performance of LSTM network using hybrid domains is assessed and compared to single domains.

LOAD SEQUENCE TRANSFORMATION
The load forecasting is done by using a LSTM network that is one of the best deep learning approaches appropriate for sequences and time series. Most of researches use the load sequence in time domain as the input of a predictor. But, the load sequence in time domain may not reveal all useful information related to consumption behavior of customers and variations of electricity demand. In addition, the electrical load sequence has a non-stationary inherence which makes hard prediction. So, it is proposed that extract informative features from the load sequence in other domains in addition to time domain. To this end, different domains are introduced for production of informative features from the load sequence in this section. The main contributions of this work are represented as follows: 1) New features such as collaborative representation are introduced for load data analysis. (1) Domain 3: cepstrum. The cepstral coefficients are obtained from the inverse Fourier transform of logarithm of the absolute (magnitude) of its Fourier transform [25]: (2)

Domain 4: spectral centroid.
This domain is a measure of spectral shape of load sequence and concentration of load in the frequency domain: where ( ) is th element of its Fourier transform. is a very small positive constant such as = 10 −6 to avoid becoming zero of denominator. is a positive integer where − 1 previous samples of ( ) are considered for calculating the centroroid value in each point of load sequence. Domain 5: spectral roll-off. This feature determines % (for example = 80, 90 95) of summation of absolute of Fourier coefficients corresponding to previous samples of load sequence in the frequency domain [26]. Spectral roll-off reveals the skewness of the spectral shape. It distinguishes where the most of energy is concentrated in the frequency domain: (4) Domain 6: spectral flux. It represents the local changes among successive samples in the frequency domain: where ( ) indicates the absolute value of Fourier coefficient normalized by its maximum value. Domain 7: energy. Energy of load sequence contained in − 1 previous samples of load sequence in time domain: The differential operator helps to remove the non-stationary behavior of signal. The result is a load sequence with more stationary with respect to the expectation of load. Domain 9: frequency difference. The differential operator is applied to the load sequence in the frequency domain as follows:

Domain 10: Gabor features.
A Gabor function is a Gaussian function modulated by a sine wave: where where is the standard deviation of Gaussian function representing the scale of function and 0 indicates the spatial frequency in the complex exponential. The Gabor filter has complex values where the absolute of it is used for determining the contextual features of the load sequence: where , 0 ( ) = ( ) , 0 ( ) = ( ) Filtering of the load sequence by using 1D Gabor filter results in characteristics of load in different frequencies and scales in both of spectral and spatial domains. can be approximated by previous samples of load sequence through = where is the coefficient vector of approximation obtained by: By considering that each sample closer to has higher similarity to it with a more probability, a larger weight should be assigned to closer samples. To cure this requirement, the objective function in Equation (14) is regularized as follows: and is the regularization parameter that controls the relation between two terms. Derivative of the objective function is taken and set to zero. The result will be: where ‖ ‖ is the norm of and But for samples to be forecasted, is unknown that has to be predicted. So, instead of , estimate of it denoted by ̂ is used. But, what is an appropriate approximation for load sequence in th instance, i.e., = ( )? The load value in each time instance is close to load value in one previous step, i.e., ( − 1). In addition, the consumed load in each time instance can be close to the consumed load in the same instance of previous year denoted by ( ). So, = ( ) can be approximated by using both of = ( − 1) and ( ) as follows: where 0 ≤ ≤ 1 is the correlation coefficient between ( ) and ( − 1) and 0 ≤ ≤ 1 is the correlation coefficient between ( ) and ( ). ( ) is the lagged loads vector of ( ) containing the lagged values of load for th instance. Corresponding to each point of load sequence, 20 lagged load values are considered that consist of 6 variables related to the dame day from last 3 hours, 7 variables related to the same hour of day before for last 3 hours and 7 variables related to the same hour of previous week for last 3 hours. and are calculated by: where • denotes the inner product between vectors and . If ( − 1) has higher correlation with ( ), ( − 1) has larger weight in approximation of = ( ) and if ( ) has higher correlation with ( ), ( ) will have more contribution in approximation of = ( ). An illustration of load sequence in different domains are shown in Figure 1.

EXPERIMENTAL RESULTS
The performance of LSTM network is assessed in different domains of smart meter data in this section. The used dataset, assessment measures and settings of structures and free parameters are introduced. Then, the experimental results are reported.

Data, measures and settings
To assess the performance of the proposed forecasting methods, a consumption load dataset from Ireland is used. This data acquired from Irish social science data archive (ISSDA). It is related to commission for energy regulation (CER) project [27]. The consumed electrical load of 5000 to December of 2010 is involved. The residential load is just considered for experiments in this work. Three metrics are used for evaluation of forecasting results: mean absolute percent error (MAPE), mean absolute error (MAE) and mean square error (MSE). MAPE shows the forecasting accuracy, MAE indicates the average error caused by absolute difference between forecasted and actual values and MSE depicts the general deviation between forecasted and actual ones. The best prediction results are corresponding to smallest values of these metrics. Definition of these metrics are as follows: where ( ) and ( ) indicate the forecasted and actual values, respectively and denotes the number of load values to be forecasted. The experiments are done on MATLAB R2018b. 70% of available data is used for training and the remained data is used for testing. The following structure is considered for LSTM network: 30 hidden units in the LSTM layer, sequence length and mini batch size are equal to 48×7, the learning rate is equal to 0.001, and maximum 100 epochs and the 'adam' optimizer are considered. Number of lags, i.e., , in cepstrum, spectral centroid, spectral roll-off, spectral flux and energy domains are experimented with different values of = {2,7,24,48,48 × 7} and the best forecasting results are obtained by = 48. In the CR domain, is selected as = 6 and is set as = 10 −4 .

Results
The The results of load forecasting in different domains are reported in Table 1 As seen from the obtained MAPE values, the best forecasting results are achieved by time difference domain. The load sequence is often a non-stationary time series which the statistical behavior of it is not stable during time. The time differencing can degrade the non-stationary nature of load sequence. The non-stable variations that may decrease the learning amount of forecasting network can be removed by differential operator.
The use of the original load values beside the first order time differencing helps LSTM to learn better the variations of load time series. After time difference domain, the spectral In the hybrid1 domain, all 11 domains are combined together. As seen, the worst result is obtained by hybrid1 domain. This result is due to high redundancy among various features of different domains. In addition, it is found that the use of some features together with other ones may be inconsistent and decreases the learning performance. In contrast, combination of time, frequency, cepstrum, spectral centroid, time difference and frequency difference in hybrid2 domain provides superior results that ranks second after time difference domain. Comparison between hybrid1 and hybrid2 domains concludes that an appropriate combination of domains (feature spaces) can improve the forecasting results while an inappropriate combination degrades the forecasting performance

CONCLUSION
Electricity demand forecasting in various domains of time, frequency, cepstrum, spectral centroid, spectral roll-off, spectral flux, energy, time difference, frequency difference, Gabor and collaborative representation is investigated in this paper. Generally, selection of a single or hybrid domain, i.e., an appropriate feature space of load sequence, can provide improvement in load forecasting. The time difference domain not only provides a rich source of information about electrical load variations but also decreases the non-stationary behavior of load sequence. So, integration of it with original time series significantly improves the electrical load forecasting. Other domains such as frequency, cepstrum, spectral centroid and frequency difference cannot be efficient lonely. But, an appropriate combination of them can improve the forecasting result.