## Wait… is it *prediction *or *forecast*?

We often use the two terms interchangeably, I know I have, but they in fact refer to completely different things.

Simply put, **forecasting **is a term used when you deal with time series as it implies using historical data to make *educated guesses* on the future behavior of a particular system (i.e., the smart grid in our case).

Prediction on the other hand refers to *a less educated guess* based on some subjective considerations such as our previous experience.

*We can predict that a certain model will work better than another based on our experience with that model on similar data but we will forecast future energy consumption from the historical data by using the model.*

## In sample vs out of sample forecast

Depending on the scenario in sample or out of sample time series forecasting can be performed. Let us look at the difference:

**in sample**: forecasting is performed on an interval part of the data used for fitting the model. Say you used the interval 01.01.22-01.10.22 to fit the model and then forecast values for the interval 01.09.22-01.10.22.**out of sample**: forecasting is performed on an interval outside the interval used for fitting the model. Say you used the interval 01.01.22-01.10.22 to fit the model and then forecast values for the interval 02.10.22-01.11.22.

*In this post we use out of sample forecasting.*

## Now that that is settled let’s go back to our problem

In a research article I co-authored back in 2015 on prediction models for dynamic demand response: requirements, challenges, and insights (yes, the article should have used forecast instead of prediction… we all learn) it was shown that **the accuracy** of time series methods applied to kWh consumption data **depends on the consumption level**. This is not at all surprising given the properties of each dataset. The accuracy of the forecast methods also depends on **the consumption pattern** itself. *Individual residential customer data shows a larger variation in consumption than the data at the building level or from larger industrial or commercial customers*.

The figure below shows the two datasets from the paper in the form of **probability density functions** (PDFs). A PDF shows *the probability of a particular value (the kWh consumption in our case) falling within a certain interval*. From the figure, we notice that most utility (individual residential) values are centered around 0.2 kWh while campus (building level) values are around 30 kWh.

The conclusion of the study is that each dataset has its own “best” (notice how I avoid using the term *optimal*) forecast model and in another paper published in 2014 we discuss the accurate and efficient selection of the best consumption prediction method in smart grids and the necessity for parallel and intelligent algorithms.

This naturally leads us to a scenario where we must test several methods (even for training a neural network) before deciding on the most suitable for each of our data sets.

These methods are unsurprisingly simple and can lead to decent results. I will discuss some next:

## ISO averaging models

These models are based on averaging historical data and were introduced by utilities (many from the US) to estimate consumption during demand response (DR) events but can be used for regular energy forecasts as well. ISO refers here to Independent Systems Operator.

**New York ISO**: estimates baseline consumption from previous five days with the highest average kWh value. These days are chosen from a pool of ten previous days, which are selected starting two days prior to the event day, and excluding weekends, holidays, past DR event days, or days on which there was a sharp drop in the energy consumption. In addition, a day is included in the pool only if the average consumption on that day is more than 25% of the last selected day. The process repeats until all ten days have been placed in the pool of days for baseline calculation. Days are then ranked based on average hourly consumption and five days with the highest value are selected. Finally, the baseline is calculated by taking hourly averages across these days. For baseline calculation on a DR event day,**a morning adjustment factor**can also be calculated from the 2 to 4 hours values prior to the DR event by comparing (taking their ratio) calculated baseline consumption and actual measured data. The value of this adjustment factor cannot be less than 0.8 or more than 1.2.**Southern California Edison ISO**: estimates baseline consumption by averaging the past ten days. These days cannot include weekends, holidays, or past DR event days. Once ten days have been selected, the baseline is calculated as their hourly average. similar to NYISO, a morning adjustment factor is applied to the calculated baseline.**California ISO**: estimates the baseline consumption as the hourly average of three days with the highest average consumption value among a pool of ten selected previous days. Selected days cannot be weekends, holidays, or past DR event days. A morning adjustment factor can be used to improve the forecast.

## Time of Day (ToD)

This is a simple model where the consumption for tomorrow, a Wednesday, at 12:00 PM is assumed to be equal to the consumption last Wednesday at 12:00 PM or the corresponding Wednesday a year ago at 12:00 PM. It assumes that we have similar consumption patterns driven by seasonal patterns (weekly or annual). It however ignores other data such as changes in our behavior or occupancy.

## Linear regression

Probably the simplest and most straightforward method. It **assumes the time series is linear**, which for short intervals can be true, and uses methods like the least squares to fit the model on the time series. Two problems with it are (1) how to choose the historical window to capture linearity, and (2) the challenge of forecasting turning points (changes in consumption trend). In fact, most forecasting methods (neural networks included) experience the second problem and tend to predict these with a slight delay as they first need to observe the change “to become aware” of it. Spikes are hard to forecast.

## Regression Trees

In some cases data is clustered and cannot be predicted using a linear model. In this case trees are a better approach. Regression trees are decision trees applied to regressions (i.e., time series). The model is built by generating a tree containing decision branches such that the *squared error* of the model is minimized. When dealing with en energy consumption time series a regression trees uses as feature the time index to predict the *value *(kWh) of the energy consumption. To refine the tree more features such as the temperature can be added, but our previous experiments have showed that it may not help much as energy consumption is not always correlated with outside temperature. A **sample regression tree** generated based on the data from our building dataset can be seen below:

In the 2015 article our regression tree used in addition to the kWh consumption, the hourly weather forecast from nearby station. Results were however not as expected and if you take a look at the article you will notice that the regression tree method performed worse than the ISO models.

## ARIMA

Auto Regressive Integrated Moving Average is a **widely used method for time series forecast**. It can be broken down into the following 3 components.

**Auto Regressive**: uses past data to infer future values. Basically it assumes the future will be similar to the past (with a certain lag);**Integrated**: makes the time series stationary through differencing;**Moving Average**: Smoothens the time series by incorporating the dependency between an observed value and a residual error from a moving average model applied to lagged observed values.

Any of the previous 3 components can be absent giving rise to different forecast methods: AR, ARMA, MA, etc.

An important takeaway when using **ARIMA is that it does not work on seasonal time series**. If you want to take that into consideration seasonal ARIMA must be used instead.

## An the code for the above…

## Analysis of the results we got running the code above

By looking at the CDF for both datasets and for each method (see colab code above) some conclusions can be drawn:

- The efficiency of each method is different for a given dataset;
- Regression trees provide an overall better result for our setup but there when considering small MAPE the results vary;

- The efficiency of a method across different datasets varies;
- In general, methods perform better on the building dataset. The data amplitude plays a vital role.

**Note of caution**: this analysis is based on the particular demo configuration that I have used and results can differ if others are used (and even be better).

## Takeaways

In general, here are some key results from our paper summarizing the effectiveness of these methods:

- The
**prediction accuracy**is higher for customers with high consumption;- High consumption customers do not exhibit the “noise” of low consumption customers;

- Few recent observations are better predictors than large sets of historical observations;
- ISO models work very good with 2-3 weeks’ worth of data, while ARIMA usually performs well with 2 months of historical data;

- Simple ISO (Independent System Operator) averaging models are inadequate for
**weekend forecasts**;- These methods have been designed for Demand Response events which take place during weekdays. Compared to weekdays they deteriorate up to 20% when applied on weekends;

**ARIMA**achieves the best prediction accuracy for short 1-hour ahead forecasts;- Methods that capture global patterns over long periods of time are not suitable for real-time forecasts.
- Regression trees and ToD provide better medium to long-term forecasts.