Machine Learning for Retail Sales Forecasting — Features Engineering

Understand the impacts of additional features related to stock-out, store closing date, or cannibalization on a Machine Learning model for sales forecasting.

Samir Saci
9 min readOct 21, 2021
This infographic illustrates the key features for improving retail unit sales forecasting using machine learning. In the center, “Retail Sales” is highlighted, connected by arrows to several features. To the left, a stock-out sign asks if the store faced stock-out recently. A sales quantity chart asks for the maximum sales in the last “n” days. A price tag icon prompts for recent pricing changes, and a closed sign queries if the store was closed. Lastly, a sales trend chart asks for the sales.
Features Engineering for Machine Learning for Retail Sales Forecasting — (Image by Author)

Discover the power of Machine Learning for retail sales forecasting with features engineering.

As a data scientist, how can you improve your company's forecasts?

Based on the last Makridakis Forecasting Competitions feedback, machine learning models can reduce forecasting errors by 20% to 60% compared to benchmark statistical models.

Their major advantage is the capacity to include external features that heavily impact the variability of your sales.

For example, e-commerce cosmetics sales are driven by special events (promotions) and how you advertise a reference on the website (first page, second page, etc.).

How can you use this additional information to improve the accuracy?

Feature engineering is based on analytical concepts and business insights to understand what could drive your sales.

In this article, we will try to understand the impact of several features on the accuracy of a model using the M5 Forecasting competition dataset.

SUMMARY
I. Introduction
1. Data set
2. Initial Solution using LGBM
3. Features Analysis
II. Experiment
1. Additional features
2. Results
III. Conclusion
1. Generative AI
2. Next Steps

M5 Forecasting Dataset

Dataset of Retail Sales Transactions

This analysis will be based on the M5 Forecasting dataset of Walmart store sales records.

  • 1,913 days for the training set and 28 days for the evaluation set
  • 10 stores in 3 states (USA)
  • 3,049 unique in 10 stores
  • 3 main categories and 7 departments (sub-category)

The objective is to predict sales for all products in each store in the following 28 days, right after the available dataset. We have to perform 30,490 forecasts for each day in the prediction horizon.

This image shows the structure of the retail dataset used for forecasting. On the left, there are three U.S. states: Texas, Wisconsin, and California, each containing multiple stores. Each state has a column of stores: Texas (TX1, TX2, TX3), Wisconsin (WI1, WI2, WI3), and California (CA1, CA2, CA3, CA4). On the right, there are three categories of products: Hobbies, Household, and Foods, with multiple SKUs listed under each category (e.g., Hobbies 1, Hobbies 2) and the corresponding SKU counts.
M5 Forecasting Competition Dataset — (Image by Author)

We’ll use the validation set to measure the performance of our model.

Initial Solution using Machine Learning Algorithm LGBM

As a base model, we will use a clear and concise notebook shared by Anshul Sharma in Kaggle. (Link)

The idea is to understand how we can improve the accuracy of the model only by adding additional features (without touching the hyperparameters or changing the algorithm).

In this notebook, you will find all the different steps to build a quite good model with a reasonable computing time:

  1. Import and processing of raw data
  2. Exploratory Data Analysis
  3. Features Engineering
  • i) Seasonality: week number, day, month, day of the week
  • ii) Pricing: the weekly price of an item in each store, special events
  • iii) Trends: sales lags (n-p days), average volume per {item, (item +store)}, …
  • iv) Categorical Variables encoding: item, store, department, category, state

4. Model Training: 1 model LightGBM per store

Features Engineering to improve the model

To emphasize the impact of features engineering, we will not change the model and only look at which features we use.

Let us split the features used in this notebook into different buckets.


Bucket 1: Transactional Data

# Item id
'id', 'item_id',
# Store, Category, Department
'dept_id', 'cat_id', 'store_id', 'state_id'
# Transaction time
'd', 'wm_yr_wk', 'weekday', 'wday', 'month', 'year'
# Sales Qty, price and promotional events
'sold', 'event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'sell_price'

events and sell_price
Capture the impact on sales of a special event on an item of selling price XXX.

What could be the impact of a special event with -20% reduction on sales of baby formula the second week of the month?

Open Question
What would be the impact on the accuracy if we do one-hot encoding for the categorical features?


Bucket 2: Sales Lags and Average

# Sales lag n = sales quantity of day - n
'sold_lag_1', 'sold_lag_2', 'sold_lag_3', 'sold_lag_7', 'sold_lag_14', 'sold_lag_28'
# Sales average by
'item_sold_avg', 'state_sold_avg',
'store_sold_avg', 'cat_sold_avg', 'dept_sold_avg',
# Sales by XXX and YYYY
'cat_dept_sold_avg', 'store_item_sold_avg', 'cat_item_sold_avg',
'dept_item_sold_avg', 'state_store_sold_avg',
'state_store_cat_sold_avg', 'store_cat_dept_sold_avg'

lags
Measure the week-on-week or month-on-month (7 days, 28 days) similarities to capture the periodicity of sales due to people shopping at these frequencies.

Do you have relatives going to the hypermarket every Saturday to shop for the whole week?

💡 Follow me on Medium for more articles related to 🏭 Supply Chain Analytics, 🌳 Sustainability and 🕜 Productivity.

Find the full code in my Github repository: Link (Follow me :D)

Features Engineering Strategies

Additional features

Based on business insights or common sense, we will add additional features built with existing ones to help our model to capture all the key factors impacting your customer demand.

This image visualizes six “buckets” of features in retail sales forecasting: transactional data, sales lags and averages, rolling means, sales trends, stock-out and store closures, and relative price differences. An experimental workflow outlines six steps, showing how each bucket’s features impact forecast accuracy (measured by RMSE) and feature importance. The goal is to identify which step provides the best accuracy using business insights.
Experiment to understand the impact of features on the error with the validation set — (Image by Author)


Bucket 3: Rolling Mean and Rolling Mean applied on lag

# Rolling mean on actual sales
'rolling_sold_mean', 'rolling_sold_mean_3', 'rolling_sold_mean_7',
'rolling_sold_mean_14', 'rolling_sold_mean_21', 'rolling_sold_mean_28'
# Rolling mean on lag sales
'rolling_lag_7_win_7', 'rolling_lag_7_win_28', 'rolling_lag_28_win_7', 'rolling_lag_28_win_28'

rolling_sold_mean_n
Measure the average sales of the last n days.

Rolling mean is sometimes used alone as a benchmark model for statistical forecasting.

Code

rolling_lag_n_win_p
Measure the average sales of a p days windows ending n days ago.

Code

BUSINESS INSIGHTS
Sunglasses seasonality

If the rolling mean of the last 7 days is 35% higher than the average sales of the week before, that means you have started the summer season.


Bucket 4: Sales Trend and Rolling Maximum

# Selling Trend
'selling_trend', 'item_selling_trend',
# Rolling max
'rolling_sold_max', 'rolling_sold_max_1', 'rolling_sold_max_2', 'rolling_sold_max_7', 'rolling_sold_max_14', 'rolling_sold_max_21',
'rolling_sold_max_28'

Selling trend
Measure the gap between the daily sales and the average.

Code

Rolling max
What is the maximum sales in the last the n days?

Code

Spoiler: this feature will have an important impact on your accuracy.


Bucket 5: Stock-Out and Store Closed

# Stock-out id
'stock_out_id'
# Store closed
'store_closed'

stock-out

Explain that you have zero sales because of stock availability issues.

Code

Bucket 6: Price Relative to the same item in other stores or other items in the sub-category

# Relative delta price with the same item in other stores
'delta_price_all_rel'
# Relative delta price with the previous week
'delta_price_weekn-1'
# Relative delta price with the other items of the sub-category
'delta_price_cat_rel'

delta_price_weekn-1
Capture the price evolution week by week.

BUSINESS INSIGHTS
Promotions for Slow Movers

In order to reduce their inventory and purge slow movers, stores may apply aggressive pricing to boost sales.

delta_price_all_rel: Sales Cannibalization at store level
Several stores competing for sales of the same item because of price difference.

delta_price_cat_rel: Sales Cannibalization at sub-category level
Several items of the same sub-category competing for sales.

Code

Results

After running a loop of training with the six different buckets (using the same hyperparameter with the Kaggle notebook), we have the following results:

A vertical bar chart compares the Mean Squared Error (MSE) of various forecasting models. Two models stand out with much higher error rates than others, while the remaining models show lower errors. The image emphasizes the variance in performance across different forecasting approaches in retail.
RMSE on the validation set for each of the steps of the experiment— (Image by Author)

STEP 1 to STEP 2: -29% RMSE Error

A horizontal bar chart compares feature importance for a retail sales forecasting model. The top features, represented in various colors, include sales trends, sales lags, and stock-out indicators. The features are ranked by their importance to the model’s accuracy, with some showing significantly more impact than others.
Features Importance — (Image by Author)

Sales lags are positively impacting the accuracy of your model:

BUSINESS INSIGHTS
Your sales of today are highly impacted by previous days' sales.

STEP 2 to STEP 3: -118% RMSE Error

Similar to the previous image, this chart displays a ranked list of feature importance for the retail sales model. The top-ranking features are related to recent sales trends, sales lags, and price differences. The visualization reinforces the key factors that drive model performance in predicting future retail sales.
Features Importance — (Image by Author)

BUSINESS INSIGHTS
The top 3 features are all related to the sales of the last three days.

Question
Based on this insight, what could be the performance of a model like Exponential Smoothing who is taking a ponderate sum of the previous sales to compute the forecasts.

STEP 3 to STEP 4: -12% RMSE Error

This chart shows a side-by-side comparison of feature importance rankings across two steps of the forecasting process. Each color bar represents a different feature, with the top features being related to recent sales and price variations. The comparison highlights how feature importance changes as the model progresses through different stages of training.
Features Importance — (Image by Author)

Rolling max features are taking the lead at the top of the features.

STEP 4 to STEP 5: -0.1% RMSE Error

Features Importance — (Image by Author)

BAM!
I am devastated to see that the potential main added value of this article, showing the impact of stock-out or store closing, has a limited impact on the accuracy of the model.

STEP 5 to STEP 6: -1.75% RMSE Error

Features Importance — (Image by Author)

The model accuracy is slightly better, but we do not see any added features in the top 20.

Conclusion

This analysis shows the positive impact of sales lags, rolling max, and other features on the model’s accuracy.

Understand the results
The results in terms of model accuracy are quite satisfying.

However, some people still need to be more satisfied that there is no correlation between some of the newly added features and the model's performance.

Therefore, the next step will be to work on these features and the model (let’s remember that we did not touch the initial model here) to see if there is any possibility of using these features to forecast your sales better.

Implement Inventory Management Rules
Now that you have your forecasting model, you need to implement an inventory management rule to manage store replenishment.

You can take inspiration from these three articles, where we try to implement rules assuming a deterministic or stochastic demand.

Generative AI: Machine Learning x GPT

After the recent adoption of Large Language Models (LLMs) like GPT, we can enhance the user experience of analytics products with intelligent agents.

In this article, I shared my first experiment, the design of a LangChain Agent connected to a TMS.

A diagram showing an automated supply chain control tower workflow with GPT and Langchain starting with ambiguous input (represented by question marks), proceeding through SQL queries, machine learning analysis, and generating insights that are communicated to users in an understandable form.
Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] — (Image by Author)

The outputs are impressive, as we have an agent that can answer operational questions by querying a database autonomously.

What if we create a super agent for Inventory Management?

This image showcases a conversational workflow between the user and the Supply Chain GPTs . This begins with the user asking questions, such as how to start or what to minimize. The agent responds with prompts and outputs based on core Python scripts for supply chain analysis. It handles different scenarios and “what-if” questions by adapting the core module and providing specific insights like minimizing stockouts or ordering costs. The flow helps guide decision-making through interraction
Inventory Management Super Agent — (Image by Author)

My objective is to equip a GPT agent with

  • Python Scripts of Inventory Rules and Light Forecasting Models
  • Context, articles, and knowledge about Forecasting, Demand Planning, and Inventory Management

So, we have an agent that can find the proper inventory rules, set the safety, and test it with a light demand forecasting model.

For more information,

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, have a look at my website.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

--

--

Samir Saci
Samir Saci

Top Supply Chain Analytics Writer — Follow my journey using Data Science for Supply Chain Sustainability 🌳 and Productivity ⌛ https://samirsaci.com/about

Responses (3)