Machine Learning for Retail Sales Forecasting — Features Engineering

Understand the impacts of additional features related to stock-out, store closing date, or cannibalization on a Machine Learning model for sales forecasting.

9 min readOct 21, 2021

This infographic illustrates the key features for improving retail unit sales forecasting using machine learning. In the center, “Retail Sales” is highlighted, connected by arrows to several features. To the left, a stock-out sign asks if the store faced stock-out recently. A sales quantity chart asks for the maximum sales in the last “n” days. A price tag icon prompts for recent pricing changes, and a closed sign queries if the store was closed. Lastly, a sales trend chart asks for the sales. — Features Engineering for Machine Learning for Retail Sales Forecasting — (Image by Author)

Discover the power of Machine Learning for retail sales forecasting with features engineering.

As a data scientist, how can you improve your company's forecasts?

Based on the last Makridakis Forecasting Competitions feedback, machine learning models can reduce forecasting errors by 20% to 60% compared to benchmark statistical models.

Their major advantage is the capacity to include external features that heavily impact the variability of your sales.

For example, e-commerce cosmetics sales are driven by special events (promotions) and how you advertise a reference on the website (first page, second page, etc.).

How can you use this additional information to improve the accuracy?

Feature engineering is based on analytical concepts and business insights to understand what could drive your sales.

In this article, we will try to understand the impact of several features on the accuracy of a model using the M5 Forecasting competition dataset.

SUMMARY
I. Introduction
1. Data set
2. Initial Solution using LGBM
3. Features Analysis
II. Experiment
1. Additional features
2. Results
III. Conclusion
1. Generative AI
2. Next Steps

M5 Forecasting Dataset

Dataset of Retail Sales Transactions

This analysis will be based on the M5 Forecasting dataset of Walmart store sales records.

1,913 days for the training set and 28 days for the evaluation set
10 stores in 3 states (USA)
3,049 unique in 10 stores
3 main categories and 7 departments (sub-category)

The objective is to predict sales for all products in each store in the following 28 days, right after the available dataset. We have to perform 30,490 forecasts for each day in the prediction horizon.

This image shows the structure of the retail dataset used for forecasting. On the left, there are three U.S. states: Texas, Wisconsin, and California, each containing multiple stores. Each state has a column of stores: Texas (TX1, TX2, TX3), Wisconsin (WI1, WI2, WI3), and California (CA1, CA2, CA3, CA4). On the right, there are three categories of products: Hobbies, Household, and Foods, with multiple SKUs listed under each category (e.g., Hobbies 1, Hobbies 2) and the corresponding SKU counts. — M5 Forecasting Competition Dataset — (Image by Author)

We’ll use the validation set to measure the performance of our model.

Initial Solution using Machine Learning Algorithm LGBM

As a base model, we will use a clear and concise notebook shared by Anshul Sharma in Kaggle.

The idea is to understand how we can improve the accuracy of the model only by adding additional features (without touching the hyperparameters or changing the algorithm).

In this notebook, you will find all the different steps to build a quite good model with a reasonable computing time:

Import and processing of raw data
Exploratory Data Analysis
Features Engineering

i) Seasonality: week number, day, month, day of the week
ii) Pricing: the weekly price of an item in each store, special events
iii) Trends: sales lags (n-p days), average volume per {item, (item +store)}, …
iv) Categorical Variables encoding: item, store, department, category, state

4. Model Training: 1 model LightGBM per store

Features Engineering to improve the model

To emphasize the impact of features engineering, we will not change the model and only look at which features we use.

Let us split the features used in this notebook into different buckets.

—
Bucket 1: Transactional Data

# Item id
'id', 'item_id', 
# Store, Category, Department
'dept_id', 'cat_id', 'store_id', 'state_id'
# Transaction time
'd', 'wm_yr_wk', 'weekday', 'wday', 'month', 'year'
# Sales Qty, price and promotional events
'sold', 'event_name_1', 'event_type_1', 'event_name_2', 'event_type_2', 'sell_price'

events and sell_price
Capture the impact on sales of a special event on an item of selling price XXX.

What could be the impact of a special event with -20% reduction on sales of baby formula the second week of the month?

Open Question
What would be the impact on the accuracy if we do one-hot encoding for the categorical features?

—
Bucket 2: Sales Lags and Average

# Sales lag n = sales quantity of day - n
'sold_lag_1', 'sold_lag_2', 'sold_lag_3', 'sold_lag_7', 'sold_lag_14', 'sold_lag_28'
# Sales average by 
'item_sold_avg', 'state_sold_avg',
'store_sold_avg', 'cat_sold_avg', 'dept_sold_avg',
# Sales by XXX and YYYY
'cat_dept_sold_avg', 'store_item_sold_avg', 'cat_item_sold_avg',
'dept_item_sold_avg', 'state_store_sold_avg',
'state_store_cat_sold_avg', 'store_cat_dept_sold_avg'

lags
Measure the week-on-week or month-on-month (7 days, 28 days) similarities to capture the periodicity of sales due to people shopping at these frequencies.

Do you have relatives going to the hypermarket every Saturday to shop for the whole week?

💡 Follow me on Medium for more articles related to 🏭 Supply Chain Analytics, 🌳 Sustainability and 🕜 Productivity.

Find the full code in my Github repository: Link (Follow me :D)

GitHub - samirsaci/ml-forecast-features-eng: Machine Learning for Retail Sales Forecasting …

Machine Learning for Retail Sales Forecasting - Features Engineering - samirsaci/ml-forecast-features-eng

github.com

Features Engineering Strategies

Additional features

Based on business insights or common sense, we will add additional features built with existing ones to help our model to capture all the key factors impacting your customer demand.

This image visualizes six “buckets” of features in retail sales forecasting: transactional data, sales lags and averages, rolling means, sales trends, stock-out and store closures, and relative price differences. An experimental workflow outlines six steps, showing how each bucket’s features impact forecast accuracy (measured by RMSE) and feature importance. The goal is to identify which step provides the best accuracy using business insights. — Experiment to understand the impact of features on the error with the validation set — (Image by Author)

—
Bucket 3: Rolling Mean and Rolling Mean applied on lag

# Rolling mean on actual sales
'rolling_sold_mean', 'rolling_sold_mean_3', 'rolling_sold_mean_7',
'rolling_sold_mean_14', 'rolling_sold_mean_21', 'rolling_sold_mean_28'
# Rolling mean on lag sales
'rolling_lag_7_win_7', 'rolling_lag_7_win_28', 'rolling_lag_28_win_7', 'rolling_lag_28_win_28'

rolling_sold_mean_n
Measure the average sales of the last n days.

Rolling mean is sometimes used alone as a benchmark model for statistical forecasting.

Code

rolling_lag_n_win_p
Measure the average sales of a p days windows ending n days ago.

Code

BUSINESS INSIGHTS
Sunglasses seasonality

If the rolling mean of the last 7 days is 35% higher than the average sales of the week before, that means you have started the summer season.

—
Bucket 4: Sales Trend and Rolling Maximum

# Selling Trend
'selling_trend', 'item_selling_trend', 
# Rolling max
'rolling_sold_max', 'rolling_sold_max_1', 'rolling_sold_max_2', 'rolling_sold_max_7', 'rolling_sold_max_14', 'rolling_sold_max_21',
'rolling_sold_max_28'

Selling trend
Measure the gap between the daily sales and the average.

Code

Rolling max
What is the maximum sales in the last the n days?

Code

Spoiler: this feature will have an important impact on your accuracy.

—
Bucket 5: Stock-Out and Store Closed

# Stock-out id
'stock_out_id'
# Store closed
'store_closed'

stock-out
Explain that you have zero sales because of stock availability issues.

Code

—

Bucket 6: Price Relative to the same item in other stores or other items in the sub-category

# Relative delta price with the same item in other stores
'delta_price_all_rel'# Relative delta price with the previous week
'delta_price_weekn-1'# Relative delta price with the other items of the sub-category
 'delta_price_cat_rel'

delta_price_weekn-1
Capture the price evolution week by week.

BUSINESS INSIGHTS
Promotions for Slow Movers

In order to reduce their inventory and purge slow movers, stores may apply aggressive pricing to boost sales.

delta_price_all_rel: Sales Cannibalization at store level
Several stores competing for sales of the same item because of price difference.
delta_price_cat_rel: Sales Cannibalization at sub-category level
Several items of the same sub-category competing for sales.

Code

Results

After running a loop of training with the six different buckets (using the same hyperparameter with the Kaggle notebook), we have the following results:

A vertical bar chart compares the Mean Squared Error (MSE) of various forecasting models. Two models stand out with much higher error rates than others, while the remaining models show lower errors. The image emphasizes the variance in performance across different forecasting approaches in retail. — RMSE on the validation set for each of the steps of the experiment— (Image by Author)

—

STEP 1 to STEP 2: -29% RMSE Error

A horizontal bar chart compares feature importance for a retail sales forecasting model. The top features, represented in various colors, include sales trends, sales lags, and stock-out indicators. The features are ranked by their importance to the model’s accuracy, with some showing significantly more impact than others. — Features Importance — (Image by Author)

Sales lags are positively impacting the accuracy of your model:

BUSINESS INSIGHTS
Your sales of today are highly impacted by previous days' sales.

—

STEP 2 to STEP 3: -118% RMSE Error

Similar to the previous image, this chart displays a ranked list of feature importance for the retail sales model. The top-ranking features are related to recent sales trends, sales lags, and price differences. The visualization reinforces the key factors that drive model performance in predicting future retail sales. — Features Importance — (Image by Author)

BUSINESS INSIGHTS
The top 3 features are all related to the sales of the last three days.

Question
Based on this insight, what could be the performance of a model like Exponential Smoothing who is taking a ponderate sum of the previous sales to compute the forecasts.

—

STEP 3 to STEP 4: -12% RMSE Error

This chart shows a side-by-side comparison of feature importance rankings across two steps of the forecasting process. Each color bar represents a different feature, with the top features being related to recent sales and price variations. The comparison highlights how feature importance changes as the model progresses through different stages of training. — Features Importance — (Image by Author)

Rolling max features are taking the lead at the top of the features.

—

STEP 4 to STEP 5: -0.1% RMSE Error

BAM!
I am devastated to see that the potential main added value of this article, showing the impact of stock-out or store closing, has a limited impact on the accuracy of the model.

—

STEP 5 to STEP 6: -1.75% RMSE Error

The model accuracy is slightly better, but we do not see any added features in the top 20.

🏫 Discover 70+ case studies using data analytics for supply chain optimization 🚚, sustainability🌳and business optimization 🏪 in this: Cheat Sheet

Conclusion

This analysis shows the positive impact of sales lags, rolling max, and other features on the model’s accuracy.

Understand the results
The results in terms of model accuracy are quite satisfying.

However, some people still need to be more satisfied that there is no correlation between some of the newly added features and the model's performance.

Therefore, the next step will be to work on these features and the model (let’s remember that we did not touch the initial model here) to see if there is any possibility of using these features to forecast your sales better.

Implement Inventory Management Rules
Now that you have your forecasting model, you need to implement an inventory management rule to manage store replenishment.

You can take inspiration from these three articles, where we try to implement rules assuming a deterministic or stochastic demand.

Inventory Management for Retail — Deterministic Demand

Build a simple model to simulate the impact of several replenishment rules on inventory and ordering costs.

medium.com

Inventory Management for Retail — Periodic Review Policy

Implement inventory management rules based on a periodic review policy to reduce the number of stores replenishments

medium.com

Inventory Management for Retail — Stochastic Demand

Simulate the impact of safety stock level on inventory performance metrics with normal demand distribution.

medium.com

Generative AI: Machine Learning x GPT

After the recent adoption of Large Language Models (LLMs) like GPT, we can enhance the user experience of analytics products with intelligent agents.

In this article, I shared my first experiment, the design of a LangChain Agent connected to a TMS.

A diagram showing an automated supply chain control tower workflow with GPT and Langchain starting with ambiguous input (represented by question marks), proceeding through SQL queries, machine learning analysis, and generating insights that are communicated to users in an understandable form. — Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] — (Image by Author)

The outputs are impressive, as we have an agent that can answer operational questions by querying a database autonomously.

What if we create a super agent for Inventory Management?

This image showcases a conversational workflow between the user and the Supply Chain GPTs . This begins with the user asking questions, such as how to start or what to minimize. The agent responds with prompts and outputs based on core Python scripts for supply chain analysis. It handles different scenarios and “what-if” questions by adapting the core module and providing specific insights like minimizing stockouts or ordering costs. The flow helps guide decision-making through interraction — Inventory Management Super Agent — (Image by Author)

My objective is to equip a GPT agent with

Python Scripts of Inventory Rules and Light Forecasting Models
Context, articles, and knowledge about Forecasting, Demand Planning, and Inventory Management

So, we have an agent that can find the proper inventory rules, set the safety, and test it with a light demand forecasting model.

For more information,

Create GPTs to Automate Supply Chain Analytics

“The Supply Chain Analyst” is a Custom ChatGPT’s “GPT” that performs Pareto & ABC Analysis using sales data.

s-saci95.medium.com

Leveraging LLMs with LangChain for Supply Chain Analytics — A Control Tower Powered by GPT

Build an automated supply chain control tower with a LangChain SQL agent connecting an LLM with a database using…

medium.com

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, have a look at my website.

Samir Saci | Data Science & Productivity

A technical blog focusing on Data Science, Personal Productivity, Automation, Operations Research and Sustainable…

samirsaci.com

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

Machine Learning for Retail Sales Forecasting — Features Engineering

Understand the impacts of additional features related to stock-out, store closing date, or cannibalization on a Machine Learning model for sales forecasting.

M5 Forecasting Dataset

Dataset of Retail Sales Transactions

Initial Solution using Machine Learning Algorithm LGBM

Features Engineering to improve the model

GitHub - samirsaci/ml-forecast-features-eng: Machine Learning for Retail Sales Forecasting …

Machine Learning for Retail Sales Forecasting - Features Engineering - samirsaci/ml-forecast-features-eng

Features Engineering Strategies

Additional features

Code

Code

Code

Code

Code

Code

Results

Conclusion

Inventory Management for Retail — Deterministic Demand

Build a simple model to simulate the impact of several replenishment rules on inventory and ordering costs.

Inventory Management for Retail — Periodic Review Policy

Implement inventory management rules based on a periodic review policy to reduce the number of stores replenishments

Inventory Management for Retail — Stochastic Demand

Simulate the impact of safety stock level on inventory performance metrics with normal demand distribution.

Generative AI: Machine Learning x GPT

Create GPTs to Automate Supply Chain Analytics

“The Supply Chain Analyst” is a Custom ChatGPT’s “GPT” that performs Pareto & ABC Analysis using sales data.

Leveraging LLMs with LangChain for Supply Chain Analytics — A Control Tower Powered by GPT

Build an automated supply chain control tower with a LangChain SQL agent connecting an LLM with a database using…

About Me

Samir Saci | Data Science & Productivity

A technical blog focusing on Data Science, Personal Productivity, Automation, Operations Research and Sustainable…

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Samir Saci

Responses (3)