{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "FJogEfYBOfUm" }, "source": [ "# Chapter 8: Winningest Methods in Time Series Forecasting\n", "\n", "Compiled by: Sebastian C. Ibañez\n", "\n", "In previous sections, we examined several models used in time series forecasting such as ARIMA, VAR, and Exponential Smoothing methods. While the main advantage of traditional statistical methods is their ability to perform more sophisticated inference tasks directly (e.g. hypothesis testing on parameters, causality testing), they usually lack predictive power because of their rigid assumptions. That is not to say that they are necessarily inferior when it comes to forecasting, but rather they are typically used as performance benchmarks.\n", "\n", "In this section, we demonstrate several of the fundamental ideas and approaches used in the recently concluded [`M5 Competition`](https://mofc.unic.ac.cy/m5-competition/) where challengers from all over the world competed in building time series forecasting models for both [`accuracy`](https://www.kaggle.com/c/m5-forecasting-accuracy) and [`uncertainty`](https://www.kaggle.com/c/m5-forecasting-uncertainty) prediction tasks. Specifically, we explore the machine learning model that majority of the competition's winners utilized: [`LightGBM`](https://lightgbm.readthedocs.io/en/latest/index.html), a tree-based gradient boosting framework designed for speed and efficiency." ] }, { "cell_type": "markdown", "metadata": { "id": "olkHffRbOfUr" }, "source": [ "## 1. M5 Dataset\n", "\n", "You can download the M5 dataset from the Kaggle links above. \n", "\n", "Let's load the dataset and examine it." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "id": "floWkFlxOfUs" }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "plot_x_size = 15\n", "plot_y_size = 2\n", "\n", "np.set_printoptions(precision = 6, suppress = True)\n", "\n", "date_list = [d.strftime('%Y-%m-%d') for d in pd.date_range(start = '2011-01-29', end = '2016-04-24')]\n", "\n", "df_calendar = pd.read_csv('../data/m5/calendar.csv')\n", "df_price = pd.read_csv('../data/m5/sell_prices.csv')\n", "df_sales = pd.read_csv('../data/m5/sales_train_validation.csv')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 456 }, "id": "mjsCtvwkOfUt", "outputId": "506c2c09-de24-4f59-956c-55fe6242e08f" }, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "item_id | \n", "dept_id | \n", "cat_id | \n", "store_id | \n", "state_id | \n", "2011-01-29 | \n", "2011-01-30 | \n", "2011-01-31 | \n", "2011-02-01 | \n", "... | \n", "2016-04-15 | \n", "2016-04-16 | \n", "2016-04-17 | \n", "2016-04-18 | \n", "2016-04-19 | \n", "2016-04-20 | \n", "2016-04-21 | \n", "2016-04-22 | \n", "2016-04-23 | \n", "2016-04-24 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "HOBBIES_1_001_CA_1_validation | \n", "HOBBIES_1_001 | \n", "HOBBIES_1 | \n", "HOBBIES | \n", "CA_1 | \n", "CA | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "3 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "3 | \n", "0 | \n", "1 | \n", "1 | \n", "
1 | \n", "HOBBIES_1_002_CA_1_validation | \n", "HOBBIES_1_002 | \n", "HOBBIES_1 | \n", "HOBBIES | \n", "CA_1 | \n", "CA | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "HOBBIES_1_003_CA_1_validation | \n", "HOBBIES_1_003 | \n", "HOBBIES_1 | \n", "HOBBIES | \n", "CA_1 | \n", "CA | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "2 | \n", "1 | \n", "2 | \n", "1 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "1 | \n", "
3 | \n", "HOBBIES_1_004_CA_1_validation | \n", "HOBBIES_1_004 | \n", "HOBBIES_1 | \n", "HOBBIES | \n", "CA_1 | \n", "CA | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "1 | \n", "0 | \n", "5 | \n", "4 | \n", "1 | \n", "0 | \n", "1 | \n", "3 | \n", "7 | \n", "2 | \n", "
4 | \n", "HOBBIES_1_005_CA_1_validation | \n", "HOBBIES_1_005 | \n", "HOBBIES_1 | \n", "HOBBIES | \n", "CA_1 | \n", "CA | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "2 | \n", "1 | \n", "1 | \n", "0 | \n", "1 | \n", "1 | \n", "2 | \n", "2 | \n", "2 | \n", "4 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
30485 | \n", "FOODS_3_823_WI_3_validation | \n", "FOODS_3_823 | \n", "FOODS_3 | \n", "FOODS | \n", "WI_3 | \n", "WI | \n", "0 | \n", "0 | \n", "2 | \n", "2 | \n", "... | \n", "2 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "
30486 | \n", "FOODS_3_824_WI_3_validation | \n", "FOODS_3_824 | \n", "FOODS_3 | \n", "FOODS | \n", "WI_3 | \n", "WI | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
30487 | \n", "FOODS_3_825_WI_3_validation | \n", "FOODS_3_825 | \n", "FOODS_3 | \n", "FOODS | \n", "WI_3 | \n", "WI | \n", "0 | \n", "6 | \n", "0 | \n", "2 | \n", "... | \n", "2 | \n", "1 | \n", "0 | \n", "2 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "
30488 | \n", "FOODS_3_826_WI_3_validation | \n", "FOODS_3_826 | \n", "FOODS_3 | \n", "FOODS | \n", "WI_3 | \n", "WI | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "0 | \n", "1 | \n", "0 | \n", "3 | \n", "1 | \n", "3 | \n", "
30489 | \n", "FOODS_3_827_WI_3_validation | \n", "FOODS_3_827 | \n", "FOODS_3 | \n", "FOODS | \n", "WI_3 | \n", "WI | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "... | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "0 | \n", "
30490 rows × 1919 columns
\n", "\n", " | date | \n", "wm_yr_wk | \n", "weekday | \n", "wday | \n", "month | \n", "year | \n", "d | \n", "event_name_1 | \n", "event_type_1 | \n", "event_name_2 | \n", "event_type_2 | \n", "snap_CA | \n", "snap_TX | \n", "snap_WI | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "2011-01-29 | \n", "11101 | \n", "Saturday | \n", "1 | \n", "1 | \n", "2011 | \n", "d_1 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "
1 | \n", "2011-01-30 | \n", "11101 | \n", "Sunday | \n", "2 | \n", "1 | \n", "2011 | \n", "d_2 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "
2 | \n", "2011-01-31 | \n", "11101 | \n", "Monday | \n", "3 | \n", "1 | \n", "2011 | \n", "d_3 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "
3 | \n", "2011-02-01 | \n", "11101 | \n", "Tuesday | \n", "4 | \n", "2 | \n", "2011 | \n", "d_4 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1 | \n", "1 | \n", "0 | \n", "
4 | \n", "2011-02-02 | \n", "11101 | \n", "Wednesday | \n", "5 | \n", "2 | \n", "2011 | \n", "d_5 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "1 | \n", "0 | \n", "1 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
1964 | \n", "2016-06-15 | \n", "11620 | \n", "Wednesday | \n", "5 | \n", "6 | \n", "2016 | \n", "d_1965 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "1 | \n", "1 | \n", "
1965 | \n", "2016-06-16 | \n", "11620 | \n", "Thursday | \n", "6 | \n", "6 | \n", "2016 | \n", "d_1966 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "
1966 | \n", "2016-06-17 | \n", "11620 | \n", "Friday | \n", "7 | \n", "6 | \n", "2016 | \n", "d_1967 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "
1967 | \n", "2016-06-18 | \n", "11621 | \n", "Saturday | \n", "1 | \n", "6 | \n", "2016 | \n", "d_1968 | \n", "NaN | \n", "NaN | \n", "NaN | \n", "NaN | \n", "0 | \n", "0 | \n", "0 | \n", "
1968 | \n", "2016-06-19 | \n", "11621 | \n", "Sunday | \n", "2 | \n", "6 | \n", "2016 | \n", "d_1969 | \n", "NBAFinalsEnd | \n", "Sporting | \n", "Father's day | \n", "Cultural | \n", "0 | \n", "0 | \n", "0 | \n", "
1969 rows × 14 columns
\n", "\n", " | store_id | \n", "item_id | \n", "wm_yr_wk | \n", "sell_price | \n", "
---|---|---|---|---|
0 | \n", "CA_1 | \n", "HOBBIES_1_001 | \n", "11325 | \n", "9.58 | \n", "
1 | \n", "CA_1 | \n", "HOBBIES_1_001 | \n", "11326 | \n", "9.58 | \n", "
2 | \n", "CA_1 | \n", "HOBBIES_1_001 | \n", "11327 | \n", "8.26 | \n", "
3 | \n", "CA_1 | \n", "HOBBIES_1_001 | \n", "11328 | \n", "8.26 | \n", "
4 | \n", "CA_1 | \n", "HOBBIES_1_001 | \n", "11329 | \n", "8.26 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
6841116 | \n", "WI_3 | \n", "FOODS_3_827 | \n", "11617 | \n", "1.00 | \n", "
6841117 | \n", "WI_3 | \n", "FOODS_3_827 | \n", "11618 | \n", "1.00 | \n", "
6841118 | \n", "WI_3 | \n", "FOODS_3_827 | \n", "11619 | \n", "1.00 | \n", "
6841119 | \n", "WI_3 | \n", "FOODS_3_827 | \n", "11620 | \n", "1.00 | \n", "
6841120 | \n", "WI_3 | \n", "FOODS_3_827 | \n", "11621 | \n", "1.00 | \n", "
6841121 rows × 4 columns
\n", "\n", " | 2011-01-29 | \n", "2011-01-30 | \n", "2011-01-31 | \n", "2011-02-01 | \n", "2011-02-02 | \n", "2011-02-03 | \n", "2011-02-04 | \n", "2011-02-05 | \n", "2011-02-06 | \n", "2011-02-07 | \n", "... | \n", "2016-04-15 | \n", "2016-04-16 | \n", "2016-04-17 | \n", "2016-04-18 | \n", "2016-04-19 | \n", "2016-04-20 | \n", "2016-04-21 | \n", "2016-04-22 | \n", "2016-04-23 | \n", "2016-04-24 | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
store_id | \n", "\n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " | \n", " |
CA_1 | \n", "4337 | \n", "4155 | \n", "2816 | \n", "3051 | \n", "2630 | \n", "3276 | \n", "3450 | \n", "5437 | \n", "4340 | \n", "3157 | \n", "... | \n", "3982 | \n", "5437 | \n", "5954 | \n", "4345 | \n", "3793 | \n", "3722 | \n", "3709 | \n", "4387 | \n", "5577 | \n", "6113 | \n", "
CA_2 | \n", "3494 | \n", "3046 | \n", "2121 | \n", "2324 | \n", "1942 | \n", "2288 | \n", "2629 | \n", "3729 | \n", "2957 | \n", "2218 | \n", "... | \n", "4440 | \n", "5352 | \n", "5760 | \n", "3830 | \n", "3631 | \n", "3691 | \n", "3303 | \n", "4457 | \n", "5884 | \n", "6082 | \n", "
CA_3 | \n", "4739 | \n", "4827 | \n", "3785 | \n", "4232 | \n", "3817 | \n", "4369 | \n", "4703 | \n", "5456 | \n", "5581 | \n", "4912 | \n", "... | \n", "5337 | \n", "6936 | \n", "8271 | \n", "6068 | \n", "5683 | \n", "5235 | \n", "5018 | \n", "5623 | \n", "7419 | \n", "7721 | \n", "
CA_4 | \n", "1625 | \n", "1777 | \n", "1386 | \n", "1440 | \n", "1536 | \n", "1389 | \n", "1469 | \n", "1988 | \n", "1818 | \n", "1535 | \n", "... | \n", "2496 | \n", "2839 | \n", "3047 | \n", "2809 | \n", "2677 | \n", "2500 | \n", "2458 | \n", "2628 | \n", "2954 | \n", "3271 | \n", "
TX_1 | \n", "2556 | \n", "2687 | \n", "1822 | \n", "2258 | \n", "1694 | \n", "2734 | \n", "1691 | \n", "2820 | \n", "2887 | \n", "2174 | \n", "... | \n", "3084 | \n", "3724 | \n", "4192 | \n", "3410 | \n", "3257 | \n", "2901 | \n", "2776 | \n", "3022 | \n", "3700 | \n", "4033 | \n", "
TX_2 | \n", "3852 | \n", "3937 | \n", "2731 | \n", "2954 | \n", "2492 | \n", "3439 | \n", "2588 | \n", "3772 | \n", "3657 | \n", "2932 | \n", "... | \n", "3897 | \n", "4475 | \n", "4998 | \n", "3311 | \n", "3727 | \n", "3384 | \n", "3446 | \n", "3902 | \n", "4483 | \n", "4292 | \n", "
TX_3 | \n", "3030 | \n", "3006 | \n", "2225 | \n", "2169 | \n", "1726 | \n", "2833 | \n", "1947 | \n", "2848 | \n", "2832 | \n", "2213 | \n", "... | \n", "3819 | \n", "4261 | \n", "4519 | \n", "3147 | \n", "3938 | \n", "3315 | \n", "3380 | \n", "3691 | \n", "4083 | \n", "3957 | \n", "
WI_1 | \n", "2704 | \n", "2194 | \n", "1562 | \n", "1251 | \n", "2 | \n", "2049 | \n", "2815 | \n", "3248 | \n", "1674 | \n", "1355 | \n", "... | \n", "3862 | \n", "4862 | \n", "4812 | \n", "3236 | \n", "3069 | \n", "3242 | \n", "3324 | \n", "3991 | \n", "4772 | \n", "4874 | \n", "
WI_2 | \n", "2256 | \n", "1922 | \n", "2018 | \n", "2522 | \n", "1175 | \n", "2244 | \n", "2232 | \n", "2643 | \n", "2140 | \n", "1836 | \n", "... | \n", "6259 | \n", "5579 | \n", "5566 | \n", "4347 | \n", "4464 | \n", "4194 | \n", "4393 | \n", "4988 | \n", "5404 | \n", "5127 | \n", "
WI_3 | \n", "4038 | \n", "4198 | \n", "3317 | \n", "3211 | \n", "2132 | \n", "4590 | \n", "4486 | \n", "5991 | \n", "4850 | \n", "3240 | \n", "... | \n", "4613 | \n", "4897 | \n", "4521 | \n", "3556 | \n", "3331 | \n", "3159 | \n", "3226 | \n", "3828 | \n", "4686 | \n", "4325 | \n", "
10 rows × 1913 columns
\n", "