Chapter 8: Winningest Methods in Time Series Forecasting¶

Compiled by: Sebastian C. Ibañez

In previous sections, we examined several models used in time series forecasting such as ARIMA, VAR, and Exponential Smoothing methods. While the main advantage of traditional statistical methods is their ability to perform more sophisticated inference tasks directly (e.g. hypothesis testing on parameters, causality testing), they usually lack predictive power because of their rigid assumptions. That is not to say that they are necessarily inferior when it comes to forecasting, but rather they are typically used as performance benchmarks.

In this section, we demonstrate several of the fundamental ideas and approaches used in the recently concluded M5 Competition where challengers from all over the world competed in building time series forecasting models for both accuracy and uncertainty prediction tasks. Specifically, we explore the machine learning model that majority of the competition’s winners utilized: LightGBM, a tree-based gradient boosting framework designed for speed and efficiency.

1. M5 Dataset¶

You can download the M5 dataset from the Kaggle links above.

Let’s load the dataset and examine it.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plot_x_size = 15
plot_y_size = 2

np.set_printoptions(precision = 6, suppress = True)

date_list = [d.strftime('%Y-%m-%d') for d in pd.date_range(start = '2011-01-29', end = '2016-04-24')]

df_calendar = pd.read_csv('../data/m5/calendar.csv')
df_price = pd.read_csv('../data/m5/sell_prices.csv')
df_sales = pd.read_csv('../data/m5/sales_train_validation.csv')

df_sales.rename(columns = dict(zip(df_sales.columns[6:], date_list)), inplace = True)
df_sales

	id	item_id	dept_id	cat_id	store_id	state_id	2011-01-29	2011-01-30	2011-01-31	2011-02-01	...	2016-04-15	2016-04-16	2016-04-17	2016-04-18	2016-04-19	2016-04-20	2016-04-21	2016-04-22	2016-04-23	2016-04-24
0	HOBBIES_1_001_CA_1_validation	HOBBIES_1_001	HOBBIES_1	HOBBIES	CA_1	CA	0	0	0	0	...	1	3	0	1	1	1	3	0	1	1
1	HOBBIES_1_002_CA_1_validation	HOBBIES_1_002	HOBBIES_1	HOBBIES	CA_1	CA	0	0	0	0	...	0	0	0	0	0	1	0	0	0	0
2	HOBBIES_1_003_CA_1_validation	HOBBIES_1_003	HOBBIES_1	HOBBIES	CA_1	CA	0	0	0	0	...	2	1	2	1	1	1	0	1	1	1
3	HOBBIES_1_004_CA_1_validation	HOBBIES_1_004	HOBBIES_1	HOBBIES	CA_1	CA	0	0	0	0	...	1	0	5	4	1	0	1	3	7	2
4	HOBBIES_1_005_CA_1_validation	HOBBIES_1_005	HOBBIES_1	HOBBIES	CA_1	CA	0	0	0	0	...	2	1	1	0	1	1	2	2	2	4
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
30485	FOODS_3_823_WI_3_validation	FOODS_3_823	FOODS_3	FOODS	WI_3	WI	0	0	2	2	...	2	0	0	0	0	0	1	0	0	1
30486	FOODS_3_824_WI_3_validation	FOODS_3_824	FOODS_3	FOODS	WI_3	WI	0	0	0	0	...	0	0	0	0	0	0	0	0	1	0
30487	FOODS_3_825_WI_3_validation	FOODS_3_825	FOODS_3	FOODS	WI_3	WI	0	6	0	2	...	2	1	0	2	0	1	0	0	1	0
30488	FOODS_3_826_WI_3_validation	FOODS_3_826	FOODS_3	FOODS	WI_3	WI	0	0	0	0	...	0	0	1	0	0	1	0	3	1	3
30489	FOODS_3_827_WI_3_validation	FOODS_3_827	FOODS_3	FOODS	WI_3	WI	0	0	0	0	...	0	0	0	0	0	0	0	0	0	0

30490 rows × 1919 columns

df_calendar

	date	wm_yr_wk	weekday	wday	month	year	d	event_name_1	event_type_1	event_name_2	event_type_2	snap_CA	snap_TX	snap_WI
0	2011-01-29	11101	Saturday	1	1	2011	d_1	NaN	NaN	NaN	NaN	0	0	0
1	2011-01-30	11101	Sunday	2	1	2011	d_2	NaN	NaN	NaN	NaN	0	0	0
2	2011-01-31	11101	Monday	3	1	2011	d_3	NaN	NaN	NaN	NaN	0	0	0
3	2011-02-01	11101	Tuesday	4	2	2011	d_4	NaN	NaN	NaN	NaN	1	1	0
4	2011-02-02	11101	Wednesday	5	2	2011	d_5	NaN	NaN	NaN	NaN	1	0	1
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
1964	2016-06-15	11620	Wednesday	5	6	2016	d_1965	NaN	NaN	NaN	NaN	0	1	1
1965	2016-06-16	11620	Thursday	6	6	2016	d_1966	NaN	NaN	NaN	NaN	0	0	0
1966	2016-06-17	11620	Friday	7	6	2016	d_1967	NaN	NaN	NaN	NaN	0	0	0
1967	2016-06-18	11621	Saturday	1	6	2016	d_1968	NaN	NaN	NaN	NaN	0	0	0
1968	2016-06-19	11621	Sunday	2	6	2016	d_1969	NBAFinalsEnd	Sporting	Father's day	Cultural	0	0	0

1969 rows × 14 columns

df_price

	store_id	item_id	wm_yr_wk	sell_price
0	CA_1	HOBBIES_1_001	11325	9.58
1	CA_1	HOBBIES_1_001	11326	9.58
2	CA_1	HOBBIES_1_001	11327	8.26
3	CA_1	HOBBIES_1_001	11328	8.26
4	CA_1	HOBBIES_1_001	11329	8.26
...	...	...	...	...
6841116	WI_3	FOODS_3_827	11617	1.00
6841117	WI_3	FOODS_3_827	11618	1.00
6841118	WI_3	FOODS_3_827	11619	1.00
6841119	WI_3	FOODS_3_827	11620	1.00
6841120	WI_3	FOODS_3_827	11621	1.00

6841121 rows × 4 columns

Sample Product¶

Let’s choose a random product and plot it.

df_sample = df_sales.iloc[3, :]
series_sample = df_sample.iloc[6:]

df_sample

id            HOBBIES_1_004_CA_1_validation
item_id                       HOBBIES_1_004
dept_id                           HOBBIES_1
cat_id                              HOBBIES
store_id                               CA_1
                          ...              
2016-04-20                                0
2016-04-21                                1
2016-04-22                                3
2016-04-23                                7
2016-04-24                                2
Name: 3, Length: 1919, dtype: object

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

series_sample.plot()
plt.legend()
plt.show()

../_images/lightgbm_m5_forecasting_8_0.png

Pick a Time Series¶

Let’s try and find an interesting time series to forecast.

df_sales_total_by_store = df_sales.groupby(['store_id']).sum()
df_sales_total_by_store

	2011-01-29	2011-01-30	2011-01-31	2011-02-01	2011-02-02	2011-02-03	2011-02-04	2011-02-05	2011-02-06	2011-02-07	...	2016-04-15	2016-04-16	2016-04-17	2016-04-18	2016-04-19	2016-04-20	2016-04-21	2016-04-22	2016-04-23	2016-04-24
store_id
CA_1	4337	4155	2816	3051	2630	3276	3450	5437	4340	3157	...	3982	5437	5954	4345	3793	3722	3709	4387	5577	6113
CA_2	3494	3046	2121	2324	1942	2288	2629	3729	2957	2218	...	4440	5352	5760	3830	3631	3691	3303	4457	5884	6082
CA_3	4739	4827	3785	4232	3817	4369	4703	5456	5581	4912	...	5337	6936	8271	6068	5683	5235	5018	5623	7419	7721
CA_4	1625	1777	1386	1440	1536	1389	1469	1988	1818	1535	...	2496	2839	3047	2809	2677	2500	2458	2628	2954	3271
TX_1	2556	2687	1822	2258	1694	2734	1691	2820	2887	2174	...	3084	3724	4192	3410	3257	2901	2776	3022	3700	4033
TX_2	3852	3937	2731	2954	2492	3439	2588	3772	3657	2932	...	3897	4475	4998	3311	3727	3384	3446	3902	4483	4292
TX_3	3030	3006	2225	2169	1726	2833	1947	2848	2832	2213	...	3819	4261	4519	3147	3938	3315	3380	3691	4083	3957
WI_1	2704	2194	1562	1251	2	2049	2815	3248	1674	1355	...	3862	4862	4812	3236	3069	3242	3324	3991	4772	4874
WI_2	2256	1922	2018	2522	1175	2244	2232	2643	2140	1836	...	6259	5579	5566	4347	4464	4194	4393	4988	5404	5127
WI_3	4038	4198	3317	3211	2132	4590	4486	5991	4850	3240	...	4613	4897	4521	3556	3331	3159	3226	3828	4686	4325

10 rows × 1913 columns

plt.rcParams['figure.figsize'] = [plot_x_size, 4]

df_sales_total_by_store.T.plot()
plt.show()

../_images/lightgbm_m5_forecasting_11_0.png

series = df_sales_total_by_store.iloc[0]
print(series.name)
print('Min Dates:' + str(series[series == series.min()].index.to_list()))

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

series.plot()
plt.legend()
plt.show()

CA_1
Min Dates:['2011-12-25', '2012-12-25', '2013-12-25', '2014-12-25', '2015-12-25']

../_images/lightgbm_m5_forecasting_12_1.png

2. Pre-processing¶

Before we build a forecasting model, let’s check some properties of our time series.

Is the series non-stationary?¶

Let’s check.

from statsmodels.tsa.stattools import adfuller

result = adfuller(series)

print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')

for key, value in result[4].items():
    print('\t%s: %.3f' % (key, value))

ADF Statistic: -2.035408
p-value: 0.271267
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568

Does differencing make the series stationary?¶

Let’s check.

def difference(dataset, interval = 1):
    diff = list()
    for i in range(interval, len(dataset)):
        value = dataset[i] - dataset[i - interval]
        diff.append(value)
    return np.array(diff)
 
def inverse_difference(history, yhat, interval=1):
    return yhat + history[-interval]

series_d1 = difference(series)
result = adfuller(series_d1)

print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')

for key, value in result[4].items():
    print('\t%s: %.3f' % (key, value))

ADF Statistic: -20.626012
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568

Is the series seasonal?¶

Let’s check.

from statsmodels.graphics.tsaplots import plot_acf

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

plot_acf(series)
plt.show()

plot_acf(series, lags = 730, use_vlines = True)
plt.show()

../_images/lightgbm_m5_forecasting_19_0.png

../_images/lightgbm_m5_forecasting_19_1.png

Can we remove the seasonality?¶

Let’s check.

series_d7 = difference(series, 7)

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

plot_acf(series_d7)
plt.show()

plot_acf(series_d7, lags = 730, use_vlines = True)
plt.show()

../_images/lightgbm_m5_forecasting_21_0.png

../_images/lightgbm_m5_forecasting_21_1.png

series_d7_d30 = difference(series_d7, 30)

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

plot_acf(series_d7_d30)
plt.show()

plot_acf(series_d7_d30, lags = 730, use_vlines = True)
plt.show()

../_images/lightgbm_m5_forecasting_22_0.png

../_images/lightgbm_m5_forecasting_22_1.png

result = adfuller(series_d7_d30)

print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
print('Critical Values:')

for key, value in result[4].items():
    print('\t%s: %.3f' % (key, value))

ADF Statistic: -8.405429
p-value: 0.000000
Critical Values:
	1%: -3.434
	5%: -2.863
	10%: -2.568

series_d7_d30 = pd.Series(series_d7_d30)
series_d7_d30.index = date_list[37:]

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

series_d7_d30.plot(label = 'Differenced Series')
plt.legend()
plt.show()

../_images/lightgbm_m5_forecasting_24_0.png

What now?¶

At this point we have two options:

Model the seasonally differenced series, then reverse the differencing after making predictions.
Model the original series directly.

While (vanilla) ARIMA requires a non-stationary and non-seasonal time series, these properties are not necessary for most non-parametric ML models.

3. One-Step Prediction¶

Let’s build a model for making one-step forecasts.

To do this, we first need to transform the time series data into a supervised learning dataset.

In other words, we need to create a new dataset consisting of \(X\) and \(Y\) variables, where \(X\) refers to the features and \(Y\) refers to the target.

How far do we lookback?¶

To create the new \((X,Y)\) dataset, we first need to decide what the \(X\) features are.

For the moment, let’s ignore any exogenous variables. In this case, what determines the \(X\)s is how far we lookback. In general, we can treat the lookback as a hyperparameter, which we will call window_size.

Advanced note: Technically, we could build an entire methodology for feature engineering \(X\).

Test Set¶

To test our model we will use the last 28 days of the series.

### CREATE X,Y ####
def create_xy(series, window_size, prediction_horizon, shuffle = False):
    x = []
    y = []
    for i in range(0, len(series)):
        if len(series[(i + window_size):(i + window_size + prediction_horizon)]) < prediction_horizon:
            break
        x.append(series[i:(i + window_size)])
        y.append(series[(i + window_size):(i + window_size + prediction_horizon)])
    x = np.array(x)
    y = np.array(y)
    return x,y

### HYPERPARAMETERS ###
window_size = 365
prediction_horizon = 1

### TRAIN VAL SPLIT ### (include shuffling later)
test_size = 28
split_time = len(series) - test_size

train_series = series[:split_time]
test_series = series[split_time - window_size:]

train_x, train_y = create_xy(train_series, window_size, prediction_horizon)
test_x, test_y = create_xy(test_series, window_size, prediction_horizon)

train_y = train_y.flatten()
test_y = test_y.flatten()

print(train_x.shape)
print(train_y.shape)
print(test_x.shape)
print(test_y.shape)

(1520, 365)
(1520,)
(28, 365)
(28,)

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

series[-test_size:].plot(label = 'CA_1 Test Series')
plt.legend()
plt.show()

LightGBM¶

Now we can build a LightGBM model to forecast our time series.

Gradient boosting is an ensemble method that combines multiple weak models to produce a single strong prediction model. The method involves constructing the model (called a gradient boosting machine) in a serial stage-wise manner by sequentially optimizing a differentiable loss function at each stage. Much like other boosting algorithms, the residual errors are passed to the next weak learner and trained.

For this work, we use LightGBM, a gradient boosting framework designed for speed and efficiency. Specifically, the framework uses tree-based learning algorithms.

To tune the model’s hyperparameters, we use a combination of grid search and repeated k-fold cross validation, with some manual tuning. For more details, see the Hyperparameter Tuning notebook.

Now we train the model on the full dataset and test it.

import lightgbm as lgb

params = {
    'n_estimators': 2000,
    'max_depth': 4,
    'num_leaves': 2**4,
    'learning_rate': 0.1,
    'boosting_type': 'dart'
}

model = lgb.LGBMRegressor(first_metric_only = True, **params)

model.fit(train_x, train_y,
          eval_metric = 'l1', 
          eval_set = [(test_x, test_y)],
          #early_stopping_rounds = 10,
          verbose = 0)

LGBMRegressor(boosting_type='dart', first_metric_only=True, max_depth=4,
              n_estimators=2000, num_leaves=16)

forecast = model.predict(test_x)
s1_naive = series[-29:-1].to_numpy()
s7_naive = series[-35:-7].to_numpy()
s30_naive = series[-56:-28].to_numpy()
s365_naive = series[-364:-336].to_numpy()

print('     Naive MAE: %.4f' % (np.mean(np.abs(s1_naive - test_y))))
print('  s7-Naive MAE: %.4f' % (np.mean(np.abs(s7_naive - test_y))))
print(' s30-Naive MAE: %.4f' % (np.mean(np.abs(s30_naive - test_y))))
print('s365-Naive MAE: %.4f' % (np.mean(np.abs(s365_naive - test_y))))
print('  LightGBM MAE: %.4f' % (np.mean(np.abs(forecast - test_y))))

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

series[-test_size:].plot(label = 'True')
plt.plot(forecast, label = 'Forecast')
plt.legend()
plt.show()

     Naive MAE: 698.0000
  s7-Naive MAE: 372.2857
 s30-Naive MAE: 330.8214
s365-Naive MAE: 247.9286
  LightGBM MAE: 200.5037

../_images/lightgbm_m5_forecasting_33_1.png

Tuning Window Size¶

How does our metric change as we extend the window size?

from sklearn.model_selection import cross_val_score
from sklearn.model_selection import RepeatedKFold

params = {
    'n_estimators': 2000,
    'max_depth': 4,
    'num_leaves': 2**4,
    'learning_rate': 0.1,
    'boosting_type': 'dart'
}

windows = [7, 30, 180, 365, 545, 730]

results = []
names = []
for w in windows:

    window_size = w

    train_x, train_y = create_xy(train_series, window_size, prediction_horizon)

    train_y = train_y.flatten()

    cv = RepeatedKFold(n_splits = 10, n_repeats = 3, random_state = 123)
    scores = cross_val_score(lgb.LGBMRegressor(**params), train_x, train_y, scoring = 'neg_mean_absolute_error', cv = cv, n_jobs = -1)
    results.append(scores)
    names.append(w)
    print('%3d --- MAE: %.3f (%.3f)' % (w, np.mean(scores), np.std(scores)))

plt.rcParams['figure.figsize'] = [plot_x_size, 5]    
    
plt.boxplot(results, labels = names, showmeans = True)
plt.show()

4. Multi-Step Prediction¶

Suppose we were interested in forecasting the next \(n\)-days instead of just the next day.

There are several approaches we can take to solve this problem.

### HYPERPARAMETERS ###
window_size = 365
prediction_horizon = 1

### TRAIN VAL SPLIT ###
test_size = 28
split_time = len(series) - test_size

train_series = series[:split_time]
test_series = series[split_time - window_size:]

train_x, train_y = create_xy(train_series, window_size, prediction_horizon)
test_x, test_y = create_xy(test_series, window_size, prediction_horizon)

train_y = train_y.flatten()
test_y = test_y.flatten()

Recursive Forecasting¶

In recursive forecasting, we first train a one-step model then generate a multi-step forecast by recursively feeding our predictions back into the model.

params = {
    'n_estimators': 2000,
    'max_depth': 4,
    'num_leaves': 2**4,
    'learning_rate': 0.1,
    'boosting_type': 'dart'
}

model = lgb.LGBMRegressor(first_metric_only = True, **params)

model.fit(train_x, train_y,
          eval_metric = 'l1', 
          eval_set = [(test_x, test_y)],
          #early_stopping_rounds = 10,
          verbose = 0)

LGBMRegressor(boosting_type='dart', first_metric_only=True, max_depth=4,
              n_estimators=2000, num_leaves=16)

recursive_x = test_x[0, :]

forecast_ms = []
for i in range(test_x.shape[0]):
    pred = model.predict(recursive_x.reshape((1, recursive_x.shape[0])))
    recursive_x = np.append(recursive_x[1:], pred)
    forecast_ms.append(pred)
    
forecast_ms_rec = np.asarray(forecast_ms).flatten()
forecast_os = model.predict(test_x)

print('  One-Step MAE: %.4f' % (np.mean(np.abs(forecast_os - test_y))))
print('Multi-Step MAE: %.4f' % (np.mean(np.abs(forecast_ms_rec - test_y))))

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

series[-test_size:].plot(label = 'True')
plt.plot(forecast_ms_rec, label = 'Forecast Multi-Step')
plt.plot(forecast_os, label = 'Forecast One-Step')
plt.legend()
plt.show()

  One-Step MAE: 200.5037
Multi-Step MAE: 214.8020

../_images/lightgbm_m5_forecasting_40_1.png

Direct Forecasting¶

In direct forecasting, we train \(n\) independent models and generate a multi-step forecast by concatenating the \(n\) predictions.

For this implementation, we need to create a new \((X,Y)\) dataset, where \(Y\) is now a vector of \(n\) values.

### HYPERPARAMETERS ###
window_size = 365
prediction_horizon = 28

### TRAIN VAL SPLIT ###
test_size = 28
split_time = len(series) - test_size

train_series = series[:split_time]
test_series = series[split_time - window_size:]

train_x, train_y = create_xy(train_series, window_size, prediction_horizon)
test_x, test_y = create_xy(test_series, window_size, prediction_horizon)

from sklearn.multioutput import MultiOutputRegressor

model = MultiOutputRegressor(lgb.LGBMRegressor(), n_jobs = -1)

model.fit(train_x, train_y)

MultiOutputRegressor(estimator=LGBMRegressor(), n_jobs=-1)

forecast_ms_dir = model.predict(test_x)

print('  One-Step MAE: %.4f' % (np.mean(np.abs(forecast_os - test_y))))
print('Multi-Step MAE: %.4f' % (np.mean(np.abs(forecast_ms_dir - test_y))))

plt.rcParams['figure.figsize'] = [plot_x_size, plot_y_size]

series[-test_size:].plot(label = 'True')
plt.plot(forecast_ms_dir.T, label = 'Forecast Multi-Step')
plt.plot(forecast_os, label = 'Forecast One-Step')
plt.legend()
plt.show()

  One-Step MAE: 200.5037
Multi-Step MAE: 233.6326

../_images/lightgbm_m5_forecasting_44_1.png

Single-Shot Forecasting¶

In single-shot forecasting, we create a model that attempts to predict all \(n\)-steps simultaneously.

Unfortunately, LightGBM (tree-based methods in general) does not support multi-output models.

Forecast Combination¶

An easy way to improve forecast accuracy is to use several different methods on the same time series, and to average the resulting forecasts.

forecast_ms_comb = 0.5*forecast_ms_dir.flatten() + 0.5*forecast_ms_rec

print('  Recursive MAE: %.4f' % (np.mean(np.abs(forecast_ms_rec - test_y))))
print('     Direct MAE: %.4f' % (np.mean(np.abs(forecast_ms_dir - test_y))))
print('Combination MAE: %.4f' % (np.mean(np.abs(forecast_ms_comb - test_y))))

series[-test_size:].plot(label = 'True')
plt.plot(forecast_ms_comb, label = 'Forecast Combination')
plt.show()

  Recursive MAE: 214.8020
     Direct MAE: 233.6326
Combination MAE: 217.0313

../_images/lightgbm_m5_forecasting_47_1.png

5. Feature Importance¶

One advantage of GBM models is that it can generate feature importance metrics based on the quality of the splits (or information gain).

### HYPERPARAMETERS ###
window_size = 365
prediction_horizon = 1

### TRAIN VAL SPLIT ###
test_size = 28
split_time = len(series) - test_size

train_series = series[:split_time]
test_series = series[split_time - window_size:]

train_x, train_y = create_xy(train_series, window_size, prediction_horizon)
test_x, test_y = create_xy(test_series, window_size, prediction_horizon)

train_y = train_y.flatten()
test_y = test_y.flatten()
    
params = {
    'n_estimators': 2000,
    'max_depth': 4,
    'num_leaves': 2**4,
    'learning_rate': 0.1,
    'boosting_type': 'dart'
}

model = lgb.LGBMRegressor(first_metric_only = True, **params)

feature_name_list = ['lag_' + str(i+1) for i in range(window_size)]

model.fit(train_x, train_y,
          eval_metric = 'l1', 
          eval_set = [(test_x, test_y)],
          #early_stopping_rounds = 10,
          feature_name = feature_name_list,
          verbose = 0)

LGBMRegressor(boosting_type='dart', first_metric_only=True, max_depth=4,
              n_estimators=2000, num_leaves=16)

plt.rcParams['figure.figsize'] = [5, 5]

lgb.plot_importance(model, max_num_features = 15, importance_type = 'split')
plt.show()

../_images/lightgbm_m5_forecasting_50_0.png

Summary¶

In summary, this section has shown the following:

A quick exploration of the M5 dataset and its salient features
Checking the usual assumptions for classical time series analysis
Demonstrating the ‘out-of-the-box’ performance of LightGBM models
Demonstrating how tuning the lookback window can effect forecasting performance
Demonstrating how tuning the hyperparameters of a LightGBM model can improve performance
Summarizing the different approaches to multi-step forecasting
Illustrating that gradient boosting methods can measure feature importance

Many of the methods and approaches shown above are easily extendable. Of note, because of the non-parametric nature of most machine learning models, performing classical inference tasks like hypothesis testing is more challenging. However, trends in ML research have been moving towards greater interpretability of so-called ‘black box’ models, such as SHAP for feature importance and even Bayesian approaches to neural networks for causality inference.

References¶

[1] S. Makridakis, E. Spiliotis, and V. Assimakopoulos. The M5 Accuracy competition: Results, findings and conlusions. 2020.

[2] S. Makridakis, E. Spiliotis, V. Assimakopoulos, Z. Chen, A. Gaba, I. Tsetlin, and R. Winkler. The M5 Uncertainty competition: Results, findings and conlusions. 2020.

[3] V. Jose, and R. Winkler. Evaluating quantile assessments. Operations research, 2009.

[4] A. Gaba, I. Tsetlin, and R. Winkler. Combining interval forecasts. Decision Analysis, 2017.

Time Series Analysis Handbook

Chapter 8: Winningest Methods in Time Series Forecasting¶

1. M5 Dataset¶

Sample Product¶

Pick a Time Series¶

2. Pre-processing¶

Is the series non-stationary?¶

Does differencing make the series stationary?¶

Is the series seasonal?¶

Can we remove the seasonality?¶

What now?¶

3. One-Step Prediction¶

How far do we lookback?¶

Test Set¶

LightGBM¶

Tuning Window Size¶

4. Multi-Step Prediction¶

Recursive Forecasting¶

Direct Forecasting¶

Single-Shot Forecasting¶

Forecast Combination¶

5. Feature Importance¶

Summary¶

References¶