Machine Learning for Retail Sales Forecasting — Features Engineering

Understand the impacts of additional features related to stock-out, store closing date or cannibalization on a Machine Learning model for sales forecasting

Machine Learning for Retail Sales Forecasting — Features Engineering
Features Engineering for Machine Learning for Retail Sales Forecasting — (Image by Author)
SUMMARY
I. Introduction
1. Data set
2. Initial Solution using LGBM
3. Features Analysis
II. Experiment
1. Additional features
2. Results
III. Conclusion and next steps

I. Introduction

1. Data set

Retail Sales Forecasting by Product Family
Retail Sales Forecasting by Product Family
M5 Forecasting Competition Dataset — (Image by Author)

2. Initial Solution using LGBM

The idea is to understand how we can improve the accuracy of the model only by adding additional features (without touching the hyperparameters or changing the algorithm).

3. Features Engineering

# Item id
'id', 'item_id',
# Store, Category, Department
'dept_id', 'cat_id', 'store_id', 'state_id'
# Transaction time
'd', 'wm_yr_wk', 'weekday', 'wday', 'month', 'year'
# Sales Qty, price and promotional events
'sold', 'event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'sell_price'

What could be the impact of a special event with -20% reduction on sales of baby formula the second week of the month?

# Sales lag n = sales quantity of day - n
'sold_lag_1', 'sold_lag_2', 'sold_lag_3', 'sold_lag_7', 'sold_lag_14', 'sold_lag_28'
# Sales average by
'item_sold_avg', 'state_sold_avg',
'store_sold_avg', 'cat_sold_avg', 'dept_sold_avg',
# Sales by XXX and YYYY
'cat_dept_sold_avg', 'store_item_sold_avg', 'cat_item_sold_avg',
'dept_item_sold_avg', 'state_store_sold_avg',
'state_store_cat_sold_avg', 'store_cat_dept_sold_avg'

Do you have relatives going to the hypermarket every Saturday to shop for the whole week?

II. Experiment

1. Additional features

Retail Sales Forecasting — Features Engineering Strategy
Retail Sales Forecasting — Features Engineering Strategy
Experiment to understand the impact of features on the error with the validation set — (Image by Author)
# Rolling mean on actual sales
'rolling_sold_mean', 'rolling_sold_mean_3', 'rolling_sold_mean_7',
'rolling_sold_mean_14', 'rolling_sold_mean_21', 'rolling_sold_mean_28'
# Rolling mean on lag sales
'rolling_lag_7_win_7', 'rolling_lag_7_win_28', 'rolling_lag_28_win_7', 'rolling_lag_28_win_28'

Code (to be added to the Kaggle Notebook)

Code (to be added to the Kaggle Notebook)

# Selling Trend
'selling_trend', 'item_selling_trend',
# Rolling max
'rolling_sold_max', 'rolling_sold_max_1', 'rolling_sold_max_2', 'rolling_sold_max_7', 'rolling_sold_max_14', 'rolling_sold_max_21',
'rolling_sold_max_28'

Code (to be added to the Kaggle Notebook)

Code (to be added to the Kaggle Notebook)

# Stock-out id
'stock_out_id'
# Store closed
'store_closed'

Code (to be added to the Kaggle Notebook)

# Relative delta price with the same item in other stores
'delta_price_all_rel'
# Relative delta price with the previous week
'delta_price_weekn-1'
# Relative delta price with the other items of the sub-category
'delta_price_cat_rel'

Code (to be added to modify prices.csv)

2. Results

RMSE on the validation set for each of the step of experiment— (Image by Author)
Features Importance — (Image by Author)
Features Importance — (Image by Author)
Features Importance — (Image by Author)
Features Importance — (Image by Author)
Features Importance — (Image by Author)

III. Conclusion and next steps

Understand the results

Implement Inventory Management Rules

Data Science for Supply Chain Blog — Samir Saci

References

Senior Supply Chain Engineer — http://samirsaci.com | Data Science for Warehousing📦, Transportation 🚚 and Demand Forecasting 📈