저작자표시비영리변경금지 2.0 대한민국 이용자는 아래의 조건을 따르는 경우에 한하여 자유롭게
l 이 저작물을 복제, 배포, 전송, 전시, 공연 및 방송할 수 있습니다. 다음과 같은 조건을 따라야 합니다:
l 귀하는, 이 저작물의 재이용이나 배포의 경우, 이 저작물에 적용된 이용허락조건 을 명확하게 나타내어야 합니다.
l 저작권자로부터 별도의 허가를 받으면 이러한 조건들은 적용되지 않습니다.
저작권법에 따른 이용자의 권리는 위의 내용에 의하여 영향을 받지 않습니다. 이것은 이용허락규약(Legal Code)을 이해하기 쉽게 요약한 것입니다.
Disclaimer
저작자표시. 귀하는 원저작자를 표시하여야 합니다.
비영리. 귀하는 이 저작물을 영리 목적으로 이용할 수 없습니다.
변경금지. 귀하는 이 저작물을 개작, 변형 또는 가공할 수 없습니다.
Ph.D. Dissertation of Engineering
Improving Individuallevel ShortTerm Electric Load Forecasting for Residential
Demand Response
By
Eunjung Lee
February 2021
Program in
Digital Contents and Information Graduate School of
Convergence Science and Technology
Seoul National University
Ph.D. Dissertation of Engineering
Improving Individuallevel ShortTerm Electric Load Forecasting for Residential
Demand Response
By
Eunjung Lee
February 2021
Program in
Digital Contents and Information Graduate School of
Convergence Science and Technology
Seoul National University
Abstract
Forecasting electric load is one of critical challenges in smart grid, and it can be particularly challenging for residential demand response where each household’s daily electricity load can vary randomly and significantly. For improving forecasting accuracy to calculate reduction amount based on predicted load in demand response program, a general and widely accepted enhancement method is to set up an independent control group, but it requires a careful selection process and exclusion of the selected customers. The next best plan is to develop a forecasting method itself. To predict each customer’s load accurately, the general belief is that the best way to predict electric load is through individualized models. The existing studies, however, have focused on oneforall models because the individual models are difficult to train and require a significantly larger data size per individual. In recent years, applying deep learning for forecasting electric load has become an important research topic, but still oneforall has been the main approach. In this paper, we propose three methods. First, virtual control group adjustment method can provide the benefits of control group without requiring the main burdens for the first problem. Using a realworld dataset collected from a pilot residential demand response program, we evaluate the robustness of virtual control group adjustment method against selection bias and assess the suggested adjustment method’s mean error performance and mean absolute error performance against the traditional models. Secondly, we adopt transfer learning and meta learning that can be smoothly integrated into deep neural networks, and show how a high performance individualized model can be formed using the individual’s data collected over just several days. This is made possible by extracting the common patterns of many individuals using a sufficiently large dataset and then customizing each individual model using the specific individual’s small dataset. The proposed methods are evaluated over residential and nonresidential datasets. Finally, we suggest unified methods that
combine the suggested adjustment method and the suggested individualization methods.
Unified methods performed best in our experiment. We also show that our proposed methods using meta learning could be used as effective tools when there is a coldstart problem.
Keywords: shortterm load forecasting, virtual control group, deep learning, individu alized models, residential demand response
Student number: 201526036
Table of Contents
Chapter 1. Introduction 1
1.1 Background and Motivation . . . 1
1.2 Contributions . . . 10
Chapter 2. ShortTerm Load Forecasting (STLF) Approaches in Residen tial Sector 13 2.1 Oneforall Learning . . . 16
2.1.1 Base deep learning models for oneforall learning . . . 16
2.2 Individual Learning . . . 18
2.2.1 Statistical Methods . . . 19
2.2.2 Adjusted Statistical Methods . . . 22
2.2.3 Control Group Approach . . . 23
2.2.4 Traditional Learning . . . 23
Chapter 3. Methodology 25 3.1 Dataset . . . 25
3.2 Metric . . . 26
3.2.1 Mean Error (ME) . . . 28
3.2.2 Mean Absolute Error (MAE) . . . 28
3.2.3 Root Mean Squared Error (RMSE) . . . 29
3.2.4 Symmetric Mean Absolute Percentage Error (SMAPE) . . . . 29
3.3 Experimental Settings for Residential STLF Approaches . . . 29
3.3.1 Oneforall Learning . . . 29
3.3.2 Individual Learning . . . 30
Chapter 4. Virtual Control Group to Improve Adjusted Statistical Methods for Individual STLF 32 4.1 Virtual Control Group for Individual STLF . . . 32
4.1.1 DifferenceInDifferences Technique . . . 32
4.1.2 Virtual Control Group and Virtual Control Group Adjustment
for Individual STLF . . . 35
4.2 Experiment . . . 36
4.2.1 Experimental Settings for Virtual Control Group Adjustment . 36 4.2.2 Result and Analysis of Performance Comparison . . . 37
4.2.3 Selection Bias Analysis of Virtual Control Group . . . 43
Chapter 5. Transfer Learning and Meta Learning for Individualized STLF 47 5.1 Transfer Learning for Individualized STLF . . . 47
5.1.1 Transfer Learning . . . 47
5.1.2 Transfer Learning Framework for Individualized STLF . . . . 48
5.2 Meta Learning for Individualized STLF . . . 49
5.2.1 Meta Learning . . . 50
5.2.2 Meta Learning Framework for Individualized STLF . . . 53
5.3 Experiment . . . 54
5.3.1 Experimental Settings for Transfer and Meta Learning . . . . 56
5.3.2 Result and Analysis of Performance Comparison on Residential Dataset . . . 57
5.3.3 Result and Analysis of Performance Comparison on Nonresidential Dataset . . . 60
5.3.4 Sensitivity to Test Period on Nonresidential Dataset . . . 63
Chapter 6. Unified Methods for Group and Individualized STLF 67 6.1 Unified methods . . . 67
6.2 Experiment . . . 68
6.2.1 Experimental Settings for Unified Methods . . . 68
6.2.2 Result and Analysis of Performance Comparison on Residential Dataset . . . 68
6.2.3 Result and Analysis of Performance Comparison on Nonresidential Dataset . . . 74
Chapter 7. Discussion 80
7.1 Limitations . . . 80
7.1.1 Virtual Control Group . . . 80
7.1.2 Transfer Learning and Meta Learning . . . 81
7.1.3 Unified methods . . . 82
7.2 Coldstart Problem in STLF . . . 82
7.3 Benefits for Demand Response Programs . . . 84
7.4 Other Application Domains . . . 89
7.5 Implementation Feasibility . . . 91
Chapter 8. Conclusion 93 Bibliography 95 Appendix 105 A Number of Participants and Number of Nonparticipants for Each Mission106 B MAPE, SMAPE, and RMSE of Unadjusted/adjusted Statistical Meth ods for Nonevent Days . . . 106
C Analysis of the Event Days . . . 106
D Nonresidential Dataset . . . 112 E RMSE Performance from June to December in Nonresidential Dataset 113
Acknowlegement 115
List of Tables
2.1 Data used for training customeri’s model . . . 14 2.2 Definition of statistical methods . . . 21 2.3 Definition of two types of prehour and weather adjustment methods . 23 4.1 Definition of two types of virtual control group adjustment . . . 36 4.2 ME of unadjusted/adjusted statistical methods and virtual control group
adjusted statistical method evaluated for nonevent days . . . 39 4.3 MAE of unadjusted/adjusted statistical methods and virtual control
group adjusted statistical method evaluated for nonevent days . . . . 40 4.4 MAE, MAPE, SMAPE, RMSE of unadjusted/adjusted statistical meth
ods and virtual control group adjusted statistical method evaluated for nonevent days . . . 41 5.1 Data used for training customeri’s model . . . 48 5.2 Range setting and optimized parameters . . . 57 5.3 Performance of statistical methods and ARIMA on the residential dataset 58 5.4 Performance of oneforall and individual learning methods on the
residential dataset . . . 58 5.5 Performance of oneforall and individual learning methods on the
residential dataset with HPO for MLP . . . 60 5.6 Performance of statistical methods and ARIMA on the nonresidential
(UCI) dataset . . . 61 5.7 Performance of oneforall and individual learning methods on the
nonresidential (UCI) dataset . . . 62 5.8 Performance of oneforall and individual learning methods on the
nonresidential (UCI) dataset with HPO for MLP . . . 62
6.2 ME performances of unified methods on residential dataset . . . 70 6.3 RMSE performances of unified methods on residential dataset . . . . 71 6.4 SMAPE performances of unified methods on residential dataset . . . 72 6.5 ME performances of unified methods on nonresidential dataset . . . 75 6.6 RMSE performances of unified methods on nonresidential dataset . . 77 6.7 SMAPE performances of unified methods on nonresidential dataset . 78 7.1 Performance of transfer learning and meta learning on residential and
nonresidential (UCI) datasets based on the period of data for finetuning. 84 7.2 Performance of transfer learning and meta learning with virtual control
group adjustment on residential and nonresidential (UCI) datasets based on the period of data for finetuning. . . 87 A.1 Date, time, and participants/nonparticipants counts of 29 DR events . 107 B.2 MAPE of unadjusted/adjusted statistical methods and virtual control
group adjusted statistical method evaluated for nonevent days . . . . 108 B.3 SMAPE of unadjusted/adjusted statistical methods and virtual control
group adjusted statistical method evaluated for nonevent days . . . . 108 B.4 RMSE of unadjusted/adjusted statistical methods and virtual control
group adjusted statistical method evaluated for nonevent days . . . . 109
List of Figures
2.1 An example of input and output . . . 14
2.2 Data splits for modeling shortterm load forecasting . . . 15
2.3 Oneforall learning models and the data used for training and testing the models . . . 16
2.4 Statistical methods and the data used for training and testing the models 21 2.5 Traditional learning models and the data used for training and testing the models . . . 24
3.1 Illustration of mobile application screens for the pilot DR mission (translated from Korean) . . . 27
4.1 Difference in Differences . . . 33
4.2 Plots of ME and MAE for the year 2017 . . . 44
4.3 Effects of virtual control group’s selection bias . . . 46
5.1 Transfer learning model and the data used for training and testing the models . . . 49
5.2 Meta learning setup as 5way 1shot . . . 52
5.3 Meta learning models and the data used for training and testing the models . . . 55
5.4 RMSE and SMAPE of the most representative algorithms on residential dataset. . . 61
5.5 RMSE and SMAPE of the most representative algorithms on non residential (UCI) dataset. . . 64
5.6 SMAPE performance for choosing one month of test period between June and December . . . 66
6.1 RMSE and SMAPE of the most representative algorithms with ratio type of virtual control group adjustment on residential dataset. . . 73 6.2 RMSE and SMAPE of the most representative algorithms with ratio
type of virtual control group adjustment on nonresidential (UCI) dataset. 79 7.1 Comparison between transfer learning and meta learning on residential
and nonresidential (UCI) datasets based on the period of data for finetuning. . . 85 7.2 Comparison between transfer learning and meta learning with vir
tual control group adjustment on residential and nonresidential (UCI) datasets based on the period of data for finetuning. . . 86 C.1 Mean usage of event day and the day before event day for the participant
group and the nonparticipant group . . . 110 C.2 Reduction rate of event participants of the day before event day and
event day . . . 113 E.3 RMSE performance for choosing one month of test period between
June and December . . . 114
Chapter 1. Introduction
1.1. Background and Motivation
Reliable energy load forecasting is an important aspect of effective energy management.
Based on the forecast period, energy load forecasting can be classified into longterm, mediumterm, and shortterm forecasting (Mocanu et al. 2016a). Longterm forecasting helps in decision making for financial and operational planning. In contrast, medium to shortterm forecasting can be useful for improving power system operations such as shortterm capacity scheduling or realtime controlling. ShortTerm Load Forecasting (STLF) is especially useful for reliable and efficient operation of electric grids (Cao et al.
2015), and studies on STLF have typically focused on oneforall models. Oneforall models learn from the data of all customers and infer loads for target dates using a common model for all. In contrast, individualized models are trained on the data specific to each customer or customer group and provide direct forecasting corresponding to individual customers respectively. Although individualized forecasting has the potential to fit each customer’s specific load patterns, traditionally it has not been prioritized due to the lack of data and the chance of overfitting. Lack of data has been known as one of the major failure reasons in load forecasting (Kim and Shcherbakova 2011;
Javed et al. 2012; Ribeiro et al. 2018). However, we still need individualized load forecasting because residential customers have dynamic and volatile loads (Kong et al. 2017). The needs for individualization exist for nonresidential customers that have widely different load characteristics according to their facilities and type of businesses (Ribeiro et al. 2018). Furthermore, smart grid infrastructures are increasingly complicating the individual load characteristics (Leng et al. 2018). Besides, when sufficient load data is not collected due to buildings that are connected to smart grid systems for a short time or constructed/renovated lately, the modeling for STLF is
challenging (Moon et al. 2020).
Demand Side Management (DSM) is a core part of a smart grid, and one of the most important features of DSM is the ability to inexpensively solve the problem of occasionally high electric loads (Spees and Lave 2007; Nolan and O’Malley 2015).
Among the four types of DSM categories (energy efficiency, time of use, spinning reserve, and demand response (Palensky and Dietrich 2011)), Demand Response (DR) makes end customers change their normal electricity load patterns by providing financial incentives (Kim and Shcherbakova 2011; Qdr 2006). DR can lower consumptions by shifting the end customers’ electricity load when the wholesale market price is temporarily high due to the high demand; thus, it can provide economic benefits to both the demand response providers and the customers. Furthermore, it can reduce the overall plant and capital cost investments in the long term by lowering the peak demand and improving the reliability of the power system (Mohagheghi et al. 2010;
Siano 2014).
DR customers can be divided into three main sectors: commercial, industrial, and residential customers (Aghaei and Alizadeh 2013). The commercial sector is usually considered to need an autonomous system for participating DR program , but the system needs experts for installation and high budget(Vardakas, Zorba, and Verikoukis 2014).
Especially Heating, Ventilation, and AirConditioning (HVAC) system and lighting systems are popular due to heavy loads. HVAC systems depend on temperature or air distribution and lighting systems depend on the season of the year and the time of the day (Vardakas, Zorba, and Verikoukis 2014). Industrial sector is the biggest energy consumers with peak loads of hundreds of MW (Shoreh et al. 2016). However, it is not easy to implement DR for industrial plants. The service limitation for DR can lead the operational and production constraints. Although reliability management is more complicated than other sector customers, industrial customers still bring high peak load reductions for each customer (Aghaei and Alizadeh 2013). Residential sector has a considerably smaller reducible load per customer as compared to the large commercial
and industrial customers. The high management cost per customer has resulted in a relatively ignorable size of investment in the residential sector. Thanks to the new standards, technologies such as Smart Meter and Advanced Metering Infrastructure (AMI) that are based on cheap Internet of Things (IoT) devices, and mobile technologies, however, the managing cost per residential customer has become significantly lower in the past decade. Now, the DR programs are becoming increasingly accessible to the residential customers through the use of the lowcost systems (Nan, Zhou, and Li 2018). Additionally, residential customers have much potential to reduce or shift their loads because they are more sensitive to the electricity price (Deng et al. 2015). Hence, better solutions are needed for assessing electricity load savings of the residential DR programs (Siano 2014; Lusis et al. 2017).
The reduction amount of DR is defined as the difference between the baseline load and the actual load, where the baseline load is defined as the expected usual electricity demand of the customer in the absence of DR (Wijaya, Vasirani, and Aberer 2014;
Mohajeryami et al. 2016). According to existence of adjustment method, individ ual load forecasting methods for the baseline load can be divided into unadjusted method and adjusted method. The unadjusted method is based on a prediction func tion with the historical load data or other external data prior to the event day as the input, whereas the adjusted method additionally uses data of target day, i.e., event day in DR program, such as preevent hour load or weather data of the target day to apply an adjustment to the unadjusted method. Previous individual learning studies have mainly focused on statistical methods with or without adjustments because of their high level of explainability to users. In (Wijaya, Vasirani, and Aberer 2014), a performance analysis of statistical methods for residential customers was provided and it was shown that bias is an important element. They said their works focused intensely on residential customers while previous works focused on commercial and industrial customers. Additionally, they summarized popular statistical methods that are widely used in industry and the new method they introduced is simple but performs
best in their experiments, so we referenced the methods a lot as described in 2.2.1.
Furthermore, they tried to explain the relationship between a baseline and customer decision and participation for a DR event. Interestingly, positively biased baselines made customers more engaged and provided maximum total profits when profit shar ing was low. In (Mohajeryami et al. 2016), error analysis of forecasting methods for residential customers was carried out theoretically and empirically. Goldberg and Ag new (2013) provided guidance on methods for measurement and verification of demand response in wholesale and retail DR programs. Enernoc (2011 accessed August 2 2020) applied a prehour adjustment method using the load measurement of a few hours before the event time. The adjustment was applied in additive and ratio man ners, with a maximum limit on the adjustment. Coughlin et al.; Coughlin et al. (2008;
2009) investigated the dependency between the error performance and the number of hours used for the prehour adjustment calculation. Wi et al. (2009) proposed a forecast ing method that uses an exponential smoothing model with a weather adjustment. While many of the studies showed methods for improving forecasting accuracy, there are other studies that found a limited or even worse performance compared to the statistical or ad justed statistical models (Wijaya, Vasirani, and Aberer 2014; Mohajeryami et al. 2016;
Stephen George 2013 accessed August 2 2020). Recently, machine learning and arti ficial intelligence algorithms have been applied to forecasting methods to go beyond the statistical methods. Suganthi and Samuel (2012) introduced and reviewed energy demand forecasting models ranging from the statistical methods to the recent artifi cial neural networks. Weng and Rajagopal (2015) proposed a probabilistic estimation based on Gaussian process to reduce the reward uncertainties using aggregation of customers. Park et al. (2015) estimated customer baseline load with selforganizing map and Kmeans clustering to find the potential load patterns of DR event day. Zhou et al. (2016) developed a cubic regression spline model and multiple linear regression models to forecast baseline. Hatton, Charpentier, and MatznerLøber (2015) suggested constrained regression methods such as ridge and lasso with a control group that is
based on each customer’s load curves. For short term load forecasting, a neural network algorithm with three layers is considered in Javed et al. (2012). The hybrid forecasting model of Ghasemi et al. (2016) used flexible wavelet packet transform, feature selec tion, nonlinear least square support vector machine, and modified artificial bee colony.
Kialashaki and Reisel (2013) compared artificial neural networks with multiple linear regression models.
In the regime of STLF, traditional machine learning models have achieved great suc cess, but they are usually based on oneforall learning approach except AutoRegressive Integrated Moving Average (ARIMA). ARIMA has been an essential technique for decades, where the signals captured from the past values are extrapolated into the future values. More sophisticated machine learning models have been popular as well. In a case study involving office buildings (Li et al. 2009), a Support Vector Machine (SVM) model performed better than a conventional neural network model with three layers.
Fan and Chen (2006) suggested a hybrid network with selforganized map (SOM) and SVM. Jain et al. (2014) identified the impact of temporal and spatial granularity using a sensorbased SVM model. Hong et al. (2011) proposed a Support Vector Regression (SVR) model that predicts monthly loads using a seasonal adjustment mechanism and a chaotic immune algorithm. Clustering based models have been successfully ap plied, too. AlQahtani and Crone (2013) suggested a multivariatekNearest Neighbor (KNN) regression model. Zhang et al. (2015) used decision tree based clustering where the load of customers in each cluster is predicted by a piecewise linear regression model. Following the success of traditional machine learning, deep learning models have become popular because they allow modeling through more complex functions and provide a potential to improve the performance. Ding et al. (2015) suggested MultiLayer Perceptron (MLP) model for STLF of mediumvoltage and lowvoltage substations. Fan, Xiao, and Zhao (2017) showed that features extracted from a deep autoencoder can help improve the accuracy of building cooling load prediction, and a MLP model using the extracted features performed well in their experiment. Fan,
Xiao, and Zhao (2017) showed that features extracted from a deep autoencoder can help improve the accuracy of building cooling load prediction, and a MLP model using the extracted features performed well in their experiment. Similarly, Ribeiro et al. (2018) also used an MLP model, and they proposed transfer learning with temporal adjustments. Kong et al. (2017) developed two Recurrent Neural Networks (RNN) with Long ShortTerm Memory (LSTM) cells to predict short term load. A Convolutional Neural Network (CNN) based model was suggested in (Cai, Pipattanasomporn, and Rahman 2019), and it exceeded the performance of LSTM models and ARIMA with exogenous inputs. Choi, Ryu, and Kim (2018) combined a Residual Network (ResNet), which is an advanced type of CNN, with LSTM units. The combined model showed the best performance among MLP, LSTM, and ResNet models in their experiment. A modified ResNet without LSTM outperformed wavelet neural network models in (Chen et al. 2018).
Even though many machine learning and deep learning models have been studied, a clear limitation has been that almost all have considered oneforall models instead of individualized models. A customer’s electric load pattern, however, might have unique and multifaceted characteristics where many different features contribute toward the final pattern. If we consider all possible combinations of the features, the number of possible electric load patterns can be extremely large. The number of unique features, however, might not be very large and a person skilled in the art might be able to tell much about a customer’s electric load pattern simply by looking at the customer’s recent electric load data. For instance, residential customers might be well characterized by looking at the last several days of electric load patterns. A customer might always have very low usage in the weekday afternoon while another customer might have varying usage in the weekday afternoon, depending on how many family members are present at home each day. In such a residential scenario, it can be crucial to identify the common and important features by investigating the data of many residential customers, to recognize which features are present and important for the customer whose electric
load needs to be predicted, and to use the information in an individualized way such that the future load prediction can be improved. Obviously, oneforall models are limited because the algorithms are not specialized for individualization. Clustering models are also limited in that they do not focus on learning the features. Instead, they simply try to group the individual patterns that are formed as the combination of the underlying features.
There have been a few approaches for making individualized models work. Tas cikaraoglu and Sanandaji (2016) utilized external information. They employed a spatio temporal algorithm using a multivariate autoregressive model with the data from a variety of electrical appliances in the target house and surrounding houses. The spatial information from the surrounding houses significantly improved the prediction accuracy at the individual level. The performance, however, was verified only for the case of two residential customers. Because gathering sufficiently useful external data can be practically impossible for large scale deployments, algorithmic approaches are strongly preferred. Mocanu et al. (2016b) proposed reinforcement transfer learning with deep belief networks, where the model was trained with six years of data and evaluated over five buildings. Ribeiro et al. (2018) developed a transfer learning model that has two adaptation phases for transferring temporal knowledge and nontemporal features.
They aims to develop a forecasting model that can predict accurately despite of lack of data for new buildings. The developed model with MLP performed slightly better than the SVM model when evaluated over four buildings. The Hephaestus method they suggested has four phases: timeseries adaptation phase for the time effects such as seasonality and trends, nontemporal adaptation phase for nontemporal features and domain adaptation, standard machine learning phase for training the model and predicting loads, and adjustment phase for readjusting the predicted loads using the factors from transformation function with the timeseries adaptation and nontemporal domain adaptation phases. Their rolling base forecasting method inspired the forecast ing method for our transfer learning approach. Both transfer learning studies were for
building load forecasting, and the small number of buildings might have been a critical limitation. For transfer learning to work well, it is crucial to construct a base learner that can serve as a good initialization point of the following individualization. But when only a few buildings are used for training the base learner, there is a high chance that the characteristics of the target building are quite different from the characteristics of the buildings used for constructing base learner.
It is essential to have an accurate method for estimating the electric load reduction amount. The estimation is used for deciding the incentives or settlements to the end customers and also for calculating payments to the program operators or market aggre gators (Philip Bartholomew 2009). Therefore, high accuracy is desired and required to keep the program fair and all the relevant parties motivated. In addition, the estima tion results are used for DR program assessments and resource optimizations that are needed for grid reliability (Chen, Wu, and Fu 2012). Overall, an accurate reduction estimation of DR is an essential element to the success of a DR program (Zhang et al. 2015); however, guaranteeing a high accuracy is challenging for the residential sector. Residential DR is different from industrial or commercial DR because the load pattern is highly dynamic, and additional information such as a prescheduled load is almost never available (Mohajeryami et al. 2016). The residential customer’s demand is affected by not only the general conditions like weather but also behaviors such as personal or family activities that are difficult to predict. Apparently, forecasting models that have been developed for industrial DR end up with relatively lower performance for residential customers (Mohajeryami et al. 2016). Even though a variety of efforts have been made to improve the forecasting models in residential DR as listed in the last paragraph, a continued improvement is desired.
In addition to forecasting models, the control group approach has been used as a popular solution for reduction amount estimation. Philip Bartholomew (2009) addressed individual measurement and mass market measurement and listed the basic methods including control group. Board and others; George et al. (2007; 2018) used randomly
selected customers to form a control group and estimate the impact. In (Angrist and Pischke 2008), the characteristics of treatment group and control group were evaluated to assess the effectiveness of random selection. Hatton, Charpentier, and Matzner Løber (2015) suggested using the residential load curves to select a control group based on each individual’s load curves. Zhang et al. (2015) used a clustering approach for forming a control group. While predicted load is based on each residential customer’s historical load data and thus is appropriate for a residential customer’s reduction amount calculation, the control group approach does not use historical load data. Instead, it is based on a control group and a treatment group, where the two groups are predetermined before the DR event. Both groups are supposed to have the same (or at least very similar) statistical characteristics. Then, DR is applied to the treatment group only, such that the difference between the control group’s average load and the treatment group’s average load can be compared. The control group approach is a high accuracy method for evaluating the average load reduction, because any uncontrollable external factor (e.g., high temperature) at the time of the DR event would have the same effect on both groups and thus cancel out when the difference is calculated. However, it has two critical downsides. The first is the need to carefully set aside a group of users as a control group. The size of the control group should be sufficiently large for statistical averaging, and the selection process should be carefully managed to force the expected average loads of the two groups to be close enough. For the selection process, fairness can also become an issue, because the customers in the control group cannot participate in the DR program. The second downside is that it is not suitable for assessing a residential customer’s reduction amount. For residential DR, customer load characteristics vary widely over the DR customers, and the control group approach is accurate only when a large treatment group is compared with a large control group. To assess a residential customer’s reduction amount as needed for calculating each residential customer’s incentive, the control group’s average load is not appropriate as the baseline because it cannot reflect each customer’s particular usage pattern. One possibility for addressing
this limitation is to adopt a clustering approach as in (Zhang et al. 2015), where clusters based on historical usage are formed for the treatment group and for the control group, and where the appropriate cluster in the control group is used as the baseline for each residential customer. In the work, however, there is still a need to set aside a control group, in addition to achieving a high quality of clustering and periodically updating the clusters.
1.2. Contributions
The purpose of this dissertation is to improve individuallevel shortterm electric load forecasting for residential demand response. For the purpose, this dissertation addresses the following research:
RQ1. How virtual control group that acts as a control group can improve baseline accuracy?
RQ2. Are individual learning methods helpful for residential STLF? How to apply transfer learning and meta learning as individual learning?
RQ3. How accurately can a baseline model using the virtual control group and the individual learning methods predict load?
To answer the first research question, we propose the concept of virtual control groupthat can indeed combine the advantages without bringing the downsides of control group approach. One of the disadvantages of the control group approach is the need to set aside an independent control group. A possible solution for avoiding this overhead is to utilize the customers who do not participate in a DR event. If we can identify which residential customers are not participating in each DR event, they can be grouped and used as a control group. Because such a control group must be newly formed for each DR event and because there is no process for carefully assigning customers to the control group, we name the group as a virtual control group. Using the virtual control
group, we suggest virtual control group adjustment. It showed robust performance improvements despite of selection bias and also showed possibility of handling external factors in our experiments. The details of virtual control group is provided in Section 4.
To answer the second research question, we also propose to use transfer learning (Pan and Yang 2009) and metalearning (Finn, Abbeel, and Levine 2017) for individualized forecasting, especially for the tasks with a large number of individuals as in residential STLF problems. While the traditional models train an individualized model using the data of the target individual only, the two methods aim to use the knowledge from many other individuals as well. Despite the existing works on applying transfer learning in energy areas, to the best of our knowledge, we are the first to apply transfer learning with an explicit focus on the STLF with a large number of individuals. In our experiments, the number of customers for training the base learner is 2,633 for the residential dataset and 265 for the nonresidential dataset. For meta learning, we propose a meta learning framework with a fewshot mechanism, named asfewshot shortterm load forecasting. In the deep learning algorithm society, meta learning has received great attention in the last few years (Finn, Abbeel, and Levine 2017; Mishra et al. 2017;
Snell, Swersky, and Zemel 2017; Oreshkin, López, and Lacoste 2018; Rusu et al. 2018;
Sung et al. 2018; Gui et al. 2018). With our best knowledge, this is the first work to adopt the latest meta learning techniques for fewshot learning to STLF tasks. Meta learning uses only a few days of data, so it can be regarded as a nonconventional STLF method.
In the energy areas, the suggested framework can be applied to a variety of tasks that are based on timeseries datasets. Transfer learning and meta learning for STLF are provided in Section 5. Finally, we suggest the unified approach, which is a combination of virtual control group and individualization method to answer the third research question in Section 6. The approach using only forecasting models and the control group approach have their own advantages and disadvantages, and ideally a method to combine the advantages of the two approaches should exist. Individualization methods with virtual control group adjustment showed best performance in our experiments.
The unified approach has a potential to handle the daily pattern of each customer and external factors.
The rest of the paper is organized as follows. Section 2 presents the framework of STLF method including oneforall learning and individual learning. Section 3 describes dataset, metric, and implementation details for our experiment. Virtual control group adjustment method and experimental results of the adjustment method are provided in Section 4. Frameworks for transfer learning and meta learning are introduced and the performance of those frameworks is compared in Section 5. The unified approach is proposed and evaluated using residential dataset in Section 6. Discussions are provided in Section 7. Section 8 summarizes the key results and concludes the paper.
Chapter 2. ShortTerm Load Forecasting (STLF) Approaches in Residential Sector
In this section, we present the framework of STLF individualization including onefor all and individual learning for comparison. We include statistical methods, adjusted statistical methods, traditional learning for individual learning.
We consider a parametric modelfθ as a predictor that maps a past time seriesxto a subsequent future time seriesyˆas below:
fθ: x→yˆ
In our study,xis the observed load vector of the customerifor the past 5 days and 12 hours andyˆis the future load vector of the customer during the subsequent 12 hours. An example is shown in Figure 2.1. The ground truth of the future time series is denoted as y. To accommodate the meta learning framework, we consider each customer as a class and split all the customers in our dataset into the metatraining group ofpcustomers, {c_{1}, c_{2},· · · , cp}, and the metatest group ofqcustomers,{cp+1, cp+2,· · · , cp+q}. The first group is named as groupP and the second group is named as groupQ. Data of each group can be divided according to the time period, as shown in Figure 2.2. We refer to the load data of groupP during the training period and test period asD^{P}_{train} andD^{P}_{test}, respectively. Also, we call the load data of groupQduring training period and test period asD^{Q}_{train}andD^{Q}_{test}, respectively.
Assuming the data split shown in Figure 2.2, the training data for learning the models can be summarized as Figure 5.1. All of the models are evaluated overD_{test}^{Q} , i.e. overD^{c}_{test}^{i} for alli∈[p+ 1,· · · , p+q].
Figure 2.1. An example of input and output. The span of input vector is 5 days and 12 hours. The output is the vector of immediately following 12 hours.
Table 2.1. Data used for training customeri’s model. Test performance is evaluated overD_{test}^{c}^{i} .
Model Data used for training
Statistical methods D_{test}^{c}^{i}
ARIMA D_{train}^{c}^{i} +D_{test}^{c}^{i}
Oneforall learning D_{train}^{P}
Individual learning
Traditional D_{train}^{c}^{i}
Transfer D_{train}^{P} +D_{train}^{c}^{i}
Meta D_{train}^{P}
Figure 2.2. Data splits for modeling shortterm load forecasting. Each row is a time series, and it represents electric load data of a customer. For thepcustomers in group P,D_{train}^{P} is the load data for training period andD^{P}_{test}is the load data for test period.
D^{Q}_{train} andD_{test}^{Q} are defined in the same way, but using theq customers in groupQ instead.
2.1. Oneforall Learning
Oneforall learning is the most conventional way of applying machine learning algo rithms to STLF. As shown in Figure 2.3, the models are trained using the data of allp customers inD^{P}_{train}and evaluated over theqcustomers inD^{Q}_{test}. Note thatD^{P}_{test}and D^{Q}_{train}are not used for training.
Figure 2.3. Oneforall learning model evaluated in this work and the data used for training and testing the models. In this simplified illustration,i= 1,2are the customers in groupP (shown in yellow and blue colors) andi= 3is the target customer in group Q(shown in green color) whose electric load needs to be predicted. In other words, p= 2andq= 1.
2.1.1. Base deep learning models for oneforall learning
In this section, we describe LSTM, sequence to sequence, and ResNet models that serve as the base deep learning model for implementing oneforall learning. These models can be directly used for individual learning models. Thanks to the flexibility of deep learning models, any of the base models can be used as the backbone structure.
LSTM
The Recurrent Neural Networks (RNNs) are intended to generalize feedforward neural networks to sequences. Given a sequence input, general RNNs estimate a sequence output by capturing temporal correlations between previous and current information.
However, it is challenging for traditional RNNs to learn longrange temporal dependen cies owing to the problems of vanishing or exploding gradients (Bengio, Simard, and Frasconi 1994). LSTM (Hochreiter and Schmidhuber 1997) is one of the representative RNNs, which are capable of solving these problems. LSTM defines and maintains an internal memory cell state to keep the longrange dependencies. The memory cell state consists of states of the forget gate, input gate, and output gate, which are updated according to the previous output and subsequent input. The signal of each gate controls what to preserve or forget. LSTM is widely used for not only load forecasting (Kong et al. 2017) but also timeseries forecasting (Hua et al. 2019).
Sequence to sequence
The Sequence to Sequence (Seq2Seq) learning model (Sutskever, Vinyals, and Le 2014) is a multilayered LSTM for encoding and decoding. It can encode an input sequence to a fixed dimensional vector with deep LSTM, and then the vector can be decoded as a target sequence with another deep LSTM. The encoderdecoder approach can map all the necessary information of an input sequence into a fixedlength vector. The Seq2Seq model was designed for machine translation, but the encoderdecoder framework is also well suited for energy load forecasting (Marino, Amarasinghe, and Manic 2016).
Marino, Amarasinghe, and Manic demonstrated that Seq2Seq can perform better than the standard LSTM model for building energy load forecasting. Specifically, they demonstrated that the Seq2Seq can flexibly estimate the load for arbitrary time steps with the historical load of arbitrary time steps. Seq2seq, an extension of LSTM, is also becoming more popular in load forecasting as well as time series forecasting (Sehovac
ResNet
Recently, convolutional neural networks have been the key technique in computer vision and other pattern recognition areas. The number of layers in CNN model has been known to be important for the performance, and typically at least several and upto hundreds of layers are used in practice. Not surprisingly, stacking layers cause the problem of vanishing and exploding gradients as in LSTM. ResNet (He et al. 2016) has been developed to solve the gradient problem for CNNs. ResNet is intended to learn the input residual by referencing the layer input. Specifically, general CNNs map a few stacked layers to a desired underlying function H(x), but ResNet formulates the layers to a different functionF(x) :=H(x)−x. The original mapping of general CNNs becomesF(x) +xin ResNet, which is known as "shortcut (skip) connection", after using the feedforward connections. A residual function with a shortcut connection makes the optimization easier and improves the accuracy by enabling stacking of a substantially increased number of layers. Depending on the number of layers L in convolution and pooling operation, the ResNet model is referred to as ResNetL. Renset was implemented in combination with LSTM, and showed great performance in short term load forecasting (Choi, Ryu, and Kim 2018). We tried to implement a model that is as identical as possible to that of Choi, Ryu, and Kim (2018) for our experiments.
2.2. Individual Learning
In our study, some of the statistical methods, such as HighXofY, MidXofY, LowXofY, Exponential Moving Average (ExpMovAvg), Regression, adjusted statistical methods, and traditional learning are also evaluated for comparison. The overall information on how the data is used for training and testing are illustrated in the following. For statistical methods and adjusted statistical methods, we introduce five wellestablished statistical methods that are commonly used in industry (Lee et al. 2019) and two well known adjustment methods that are often used for adjusted statistical methods. For
traditional learning, we briefly describe LSTM, Sequence to Sequence (Seq2Seq), and Resnet, but we skip the introduction of wellknown algorithms, such as ARIMA and Extreme Gradient Boosting (XGB), and MLP.
2.2.1. Statistical Methods
A day can be divided into hourly timeslots m_{0}, . . . , m_{23}. In this work, we assume that each DR event can contain one or more of the hourly timeslots, and the notation M is used to define the set of timeslots included in the DR event. For instance,T = {m_{15}, m_{16}}if a DR event is run between 3 pm and 5 pm. While each DR event’s duration can be longer than 1 hour, the baseline function is defined for a timeslot m, because its extension to multiple hours is straightforward by summing over the multiple hours. We will use the set notationM when needed to handle the multiple hours. The five wellestablished baseline functions are HighXofY, MidXofY, LowXofY, exponential moving average, and regression. HighXofY estimates target load as the average electricity load of the days in the setD, where the setDcontains the chosen top X days out of the lastY days. Holidays, weekend days, and DR event days are excluded when definingY days. LowXofY and MidXofY are calculated in the same way, except that the bottom and middleX days are chosen and included inDfor averaging. For notation,D =High(X, Y, i, d)is used to represent the set ofX days that are chosen by HighXofY for customerifor the given day d by inspecting the lastY days before the dayd. Similarly,Low(X, Y, i, d)andM id(X, Y, i, d)represent the set ofX days that are chosen by LowXofY and MidXofY. If we defineyc_{i},d,mas the customeri’s load on daydfor timeslotm, then the estimated CBL valueyˆc_{i},d,m of HighXofY, MidXofY, and LowXofY can be expressed as shown in the first three rows of Table 2.2. Note that all three functions have the following form, where only the choice ofDis different.
ˆ
yc_{i},d,m = 1
D
X
d^{′}∈D
yc_{i},d^{′},m (2.1)
The exponential moving average is also shown in Table 2.2. For this function, the customer’s average load for thekth day since the starting day of the moving average calculation is defined assc_{i},k,m. The starting day can be chosen arbitrarily as long as the moving averaging is performed for a sufficiently long time, and the first day of data collection was chosen as day 1 in our work. The formula for a moving average is shown in Table 2.2, andλ= 0.9was used as the weighting as in the work (Wijaya, Vasirani, and Aberer 2014). Once the moving averagesc_{i},k,mis updated, the previous day’s moving average is defined as today’s baseline of the exponential moving average.
A regression baseline is simply a linear regression model, and it is shown in the last row of Table 2.2. In the model,xc_{i},mis the feature vector,θc_{i},mis the regression coefficient vector, andǫc_{i},mis the intercept scalar. The feature vector can contain information such as historical load, temperature, temperature humidity index, and sunset/sunrise time. In this study, the customer’s load information of the timeslotmfor the last five weekdays was included inx_{c}_{i}_{,m}to capture the temporal trend as in (Wijaya, Vasirani, and Aberer 2014);
xc_{i},m= [yc_{i},d_{lastmon},m yc_{i},d_{lasttue},m yc_{i},d_{lastwed},m yc_{i},d_{lastthu},m yc_{i},d_{lastf ri},m].
(2.2) Additionally, the regression vector for each customeriwas calculated using the data collected since the first day of data collection. Event days and holidays were handled by replacing with the previous weekday’s load values.
As shown in Figure 2.4, except for ExpMovAvg and Regression, there is no need to train a model and only the customer i’s own input x is used for predicting the output y. In our experiments, this means the past five days of data is used for theˆ prediction. ExpMovAvg and Regression, on the other hand, utilize all of the historical data of customeri. Therefore,D^{Q}_{train}is used in addition toD_{test}^{Q} for ExpMovAvg and Regression.
Table 2.2. Definition of statistical methods.
Model Definition HighXofY yˆci,d,m = _{X}^{1} P
d^{′}∈High(X,Y,i,d)yci,d^{′},m
MidXofY yˆ_{c}_{i}_{,d,m} = _{X}^{1} P
d^{′}∈M id(X,Y,i,d)y_{c}_{i}_{,d}^{′}_{,m} LowXofY yˆc_{i},d,m = _{X}^{1} P
d^{′}∈Low(X,Y,i,d)yc_{i},d^{′},m
ExpMovAvg
Initialization:sci,5,m= ^{1}_{5}P_{5}
j=1yci,j,m
F or k >5, sci,k,m =λsci,k−1,m+ (1−λ)yci,k,m (λ= 0.9used) ˆ
yci,d,m =sci,d−1,m
Regression yˆci,d,m =θci,m⊺
x_{c}_{i}_{,m}+ǫci,m
Figure 2.4. Statistical methods evaluated in this work and the data used for training and testing the methods. In this simplified illustration,i = 1,2are the customers in groupP (shown in yellow and blue colors) andi= 3is the target customer in groupQ (shown in green color) whose electric load needs to be predicted. In other words,p= 2 andq= 1.
2.2.2. Adjusted Statistical Methods
While the statistical methods are practical, they inherently lack the capability to handle external variations such as a change in usage pattern or weather condition. To alleviate this problem, adjustment techniques can be combined with the statistical methods. In this study, we consider two popular adjustment methods—prehour adjustment and weather adjustment. Prehour adjustment can be effective for handling the daily variation of the usage pattern. For instance, a customer’s house can have at least one person at home for all the reference days but none for the event day. Such a presence/absence related variation is a more eminent problem for residential DR than other types of DRs. To handle this problem, the prehour method makes an adjustment based on the predicted load values and actual load values of the preevent hours—among the 24 timeslots of the event day, some of the timeslots sufficiently earlier than the event time are chosen as setM^{′}, and they are used for the adjustment as shown in Table 2.3.
Typically,M^{′}is chosen to contain morning hours and thus the method is sometimes called morning adjustment. In our study, M^{′} is chosen as the first 2 of the 4 hours immediately before starting the DR event. According to the investigation for our dataset, the 2 hours immediately before the DR event can show energysaving behaviors and thus cannot be used for the adjustment. Weather adjustment can be effective for handling seasonal variations and the resulting trends in usage. The weather of the event day can be different from the weather of the reference days, i.e., the days in setD used for statistical methods. The goal of weather adjustment is to lessen this effect. For the weather adjustment, a regression functionw_{(}ci, d, m)that predicts customeri’s load as a function of weather information must be created, using data collected over a longterm period. Information such as temperature and humidity can be used as the inputs (PJM 2018 accessed January 4 2019), and in this study, we used the outdoor temperature information only. The weather adjusted baselineyˆ_{c}^{adj}
i,d,m withw_{(}ci, d, m)is presented in Table 2.3. As shown in Table 2.3, we consider both additive type and ratio type for adjusted CBL depending on how the adjustment is applied.
Table 2.3. Definition of two types of prehour and weather adjustment methods.
Adjustment method
Additive type Ratio type
Prehour yˆ^{adj}_{c}_{i}_{,d,m}= ˆyci,d,m+
Pm′∈M′(y_{ci,d,m}′−ˆy_{ci,d,m}′)
M^{′} yˆ^{adj}_{c}_{i}_{,d,m}= ˆyci,d,m
Pm′∈M′y_{ci,d,m}′
Pm′∈M′yˆ_{ci,d,m}′
Weather yˆ^{adj}_{c}_{i}_{,d,m}= ˆyci,d,m+{w(ci, d, m)−_{D}^{1} P
d^{′}∈Dw(ci, d^{′}, m)} yˆ^{adj}_{c}_{i}_{,d,m}= ˆyci,d,m
w_{(}ci,d,m)
1
D
Pd′∈Dw_{(}ci,d^{′},m)
2.2.3. Control Group Approach
The conventional controlled experiments are based on random assignments of subjects into treatment and control groups. For residential DR, customers in the treatment group are invited to join DR, while the customers in the control group are excluded. At the end of each DR event, the average usage of the control group is used as the baseline. The control group approach causes the common external factors, such as the temperature at the time of DR, to be canceled out when the difference between the two groups is calculated.
2.2.4. Traditional Learning
As shown in Figure 2.5, only the historical data of customer i is used for training traditional individual model except for ARIMA. The training for customeriis performed using D^{c}_{train}^{i} , and the evaluation is performed usingD^{c}_{test}^{i} . This model is simple to understand and easy to implement, but it suffers from the small size ofD^{c}_{train}^{i} and lack of general understanding over all the customers. ARIMA, on the other hand, utilizes all of the historical data of customeri. Therefore,D_{train}^{Q} is used in addition toD^{Q}_{test}for ARIMA.
Figure 2.5. Traditional learning models evaluated in this work and the data used for training and testing the models. In this simplified illustration,i= 1,2are the customers in groupP (shown in yellow and blue colors) andi= 3is the target customer in group Q(shown in green color) whose electric load needs to be predicted. In other words, p= 2andq= 1.
Chapter 3. Methodology
In this chapter, we describe the dataset and four metrics used for our study. Additionally, implementation details for oneforall learning and individual learning are explained.
3.1. Dataset
The residential DR program analyzed in this paper was operated in Korea through a smartphone application from June 2016 to December 2017 for a total of 31,615 residential customers. Each DR event occurred once a week through the smartphone app. The DR was a Peak Time Rebate (PTR) program, where an incentive was paid to each customer according to the amount of electricity load saved as compared to the customer’s baseline load. The mobile app’s screenshots for running an event are shown in Figure 3.1. For each DR event, each customer was presented with a certain amount of a saving target. The customer had to press the “Join” icon within 24 hours before the event to participate; otherwise, the customer automatically became a nonparticipant who was not eligible for incentive payments. If the customer indeed met the saving target at the event time, the customer was rewarded with a fixed amount of compensation (approximately $1), plus an additional compensation that was proportional to the saving amount (approximately $1.3/kWh). For the analysis in this work, we only used the data between January 2017 and December 2017, such that one cycle of seasonal changes can be addressed. Two datasets of electricity load data and event log data were collected from 3,543 customers. The electricity load data were originally recorded as the electricity consumed by each customer for every 15 minutes, but we aggregated it to 1hour resolution for a simpler analysis. Note that each DR event was run for 1 hour, and the start time was aligned with the 1hour resolution data.
For the convenience of analysis and discussion in Section 4, we define two subsets
event days. There were additional three DR events over the weekends, but they are excluded in this study. Nonevent day data refer to the electricity load data of 213 days remaining after excluding event days, holidays, and weekends. Event log data contain the list of event days and times together with the list of customers who participated. For each customer, the record of success or failure to meet the preset target was available as well. For the 29 DR events that occurred throughout the weekdays of 2017, the detailed information including event day, event time, number of participants, and number of nonparticipants can be found in Appendix A. A small number of customers, who had very low electricity consumption per hour (below 10 Wh) or who were equipped with solar solutions, were excluded from the analysis. For the outdoor temperature, we used the data provided by the national climate data service system in Korea (kor 2019 accessed January 4 2019).
For analysis and discussion in Section 5, we randomly chose 2,633 customers for group P and 910 customers for group Q. Among 2,633 customers in groupP, the data of 526 randomly chosen customers were used as the validation set. Additionally, we set the training period from January to July and the test period from August to September because the high temperature and the use of air conditioning caused a high load variance in the summer for the residential data (Lee et al. 2019) and we wanted to evaluate the modeling effects for the challenging period.
3.2. Metric
The error metrics used in this study are Mean Error (ME), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Symmetric Mean Absolute Percentage Error (SMAPE), which are typical metrics used to evaluate STLF models. Lety_{c,d,m}is the m’th element of the ground truth load vector of daydfor the customercduring the test period andyˆc,d,mis the same for the predicted load vector.
Figure 3.1. Illustration of mobile application screens for the pilot DR mission (translated from Korean). The customer had to press the “Join” icon within 24 hours before the event to participate. If a participating customer was successful with achieving the electricity load saving goal, an incentive was provided.
3.2.1. Mean Error (ME)
ME = 1
C×D×M
C
X
c=1 D
X
d=1 M
X
m=1
(yc,d,m−ˆyc,d,m) (3.1) Note that ME is simply the average difference between predicted load and the true load and that it is the average over all the participating customers and all the event hours. In some of the previous DR studies, ME is also referred as bias. While ME is an error metric used for the nonevent days, it is relevant to the impact estimation for the event days. For the event days, the actual loadyc,d,m reflects the energysaving behaviors, while the estimated loadˆyc,d,m does not. Therefore, the difference averaged overC,D, andM is an assessment of the residential DR’s effectiveness at the program level, assuming thatˆy_{c,d,m}is sufficiently accurate. By allowing positive and negative contributions from each customers to be canceled out, the electricity load saving of the entire population as a whole can be evaluated by the metric.
3.2.2. Mean Absolute Error (MAE)
MAE = 1
C×D×M
C
X
c=1 D
X
d=1

M
X
m=1
(yc,d,m−ˆy_{c,d,m}) (3.2) MAE is similar to ME, except that the absolute value of the error is averaged overC.
Thus, MAE does not allow positive and negative contributions from different customers to be canceled out. While MAE is also an error metric used for the nonevent days, this metric is relevant to settlement calculation for the event days. Settlement is the amount of reward provided to each customer according to the customer’s electricity load saving amount, and it must be calculated for each customer. Because we are interested in the accuracy of the settlement calculation, we should not allow positive and negative errors to be canceled out. Thus, the absolute values should be averaged.
3.2.3. Root Mean Squared Error (RMSE)
RMSE = v u u t
1 C ×D×M
C
X
c=1 D
X
d=1 M
X
m=1
(yc,d,m−ˆyc,d,m)^{2} (3.3) RMSE is the square root of mean squared error. This metric is commonly used in lieu of MAE and the performance of RMSE was similar to MAE.
3.2.4. Symmetric Mean Absolute Percentage Error (SMAPE)
SMAPE = 100 C ×D×M
C
X
c=1 D
X
d=1 M
X
m=1
yc,d,m−ˆy_{c,d,m}
(yc,d,m+ˆy_{c,d,m})/2 (3.4) SMAPE scales the error by the average between the predicted load and actual load.
Thus, SMAPE represents the error rate depending on each customer’s load scale.
3.3. Experimental Settings for Residential STLF Approaches
In this section, we explain implementation details for oneforall learning and individual learning. Statistical methods, Adjusted statistical methods, and traditional learning are considered for individual learning.
3.3.1. Oneforall Learning
To compare the individual learning models with the existing methods, we used Extreme Gradient Boosting (XGB), and four deep learning models (MLP, LSTM, Seq2Seq, ResNet/LSTM).
XGB was included because it is one of the most popular and well performing models among the traditional machine learning algorithms. For XGB, we set the total tree number as 1,000, the depth of each individual tree as four, and the learning rate as in (Fan, Xiao, and Zhao 2017).
We evaluated deep learning models including MLP, LSTM, Seq2Seq, and ResNet 12 with LSTM (ResNet/LSTM). We use four fully connected layers for the MLP model with 256, 128, 64, and 64 units, respectively. The LSTM model comprises seven cells with 300 hidden nodes and a fully connected layer. Seq2Seq model has three LSTM cells with 100 hidden nodes for both the encoder and decoder. We also use the same architecture of ResNet12 with LSTM as in (Choi, Ryu, and Kim 2018), which consists of twelvelayer ResNet and the following three cells with 300 hidden nodes. All models were trained and evaluated for each customer. Deep learning models were trained for 150 epochs. Those models were optimized using Adam optimizer with a learning rate of 0.001. Batch size was 128, and activation function was ReLU. We did not apply an activation function on the output layer, the cell output of the last LSTM cell, or any outputs of LSTM cells in ResNet/LSTM. These hyperparameters were set the same as the hyperparameters of (Choi, Ryu, and Kim 2018) except for the number of epochs.
Their settings were found to be suitable for the proposed work.
We usedpythonfor our experiment. Auto ARIMA in the pmdarima library and XGBRegressor in the XGBoost library were used for ARIMA and XGB. Keras and Tensorflow libraries were used for the deep learning models.
3.3.2. Individual Learning
Here we explained details of statistical methods, adjusted statistical methods, and traditional learning.
Statistical Methods
We used ten statistical models (High10of10, High5of5, High4of5, High5of10, Low4of5, Low5of10, Mid4of6, Mid8of10, Regression, ExpMovAvg) for Section 4 and four statistical models (High5of5, High4of5, and Low4of5) for Section 5.
Adjusted Statistical Methods
We used prehour adjustment and weather adjustment for adjusted statistical methods.
For our analysis of weather adjusted method, there was a limitation because the data used in this study were available only from June 2016, but the effect should be minimal, asw(ci, d, m)was chosen as a very simple linear function with temperature as its only input.
Traditional Learning
We used ARIMA, XGB, and four deep learning models (MLP, LSTM, Seq2Seq, ResNet/LSTM). For the training of ARIMA, we used all available days before each forecasting day. The order of AR and MA for ARIMA was optimized for each indi vidual with the maximum order of three for both autoregressive and moving average models.
Hyperparameters for XGB and four deep learning models are exactly same in 3.3.1.
Additionally, used libraries are also the same.