Machine Learning for Retail Sales Forecasting — Features Engineering

Understand the impacts of additional features related to stock-out, store closing date or cannibalization on a Machine Learning model for sales forecasting

Samir Saci
8 min readOct 21, 2021


Machine Learning for Retail Sales Forecasting — Features Engineering
Features Engineering for Machine Learning for Retail Sales Forecasting — (Image by Author)

Discover the power of Machine Learning for retail sales forecasting with features engineering.

In this article, we explore how additional features like stock-out, store closing dates, and cannibalization impact the accuracy of a Machine Learning model using the M5 Forecasting competition dataset.

Learn how features engineering can improve your sales forecasting by up to 60%, as shown in the Makridakis Forecasting Competitions.

Follow our experiment and analysis of six different feature buckets to optimize your sales predictions and implement effective inventory management rules.


Based on the feedback of the last Makridakis Forecasting Competitions, Machine Learning models can reduce the forecasting error by 20% to 60% compared to benchmark statistical models. (M5 Competition)

Their major advantage is the capacity to include external features that heavily impact the variability of your sales.

For example, e-commerce cosmetics sales are driven by special events (promotions) and how you advertise a reference on the website (first page, second page, …).

This process called features engineering is based on analytical concepts and business insights to understand what could drive your sales.

In this article, we will try to understand the impact of several features on the accuracy of a model using the M5 Forecasting competition dataset.

💌 New articles straight in your inbox for free: Newsletter

I. Introduction
1. Data set
2. Initial Solution using LGBM
3. Features Analysis
II. Experiment
1. Additional features
2. Results
III. Conclusion and next steps

I. The M5 Forecasting Dataset: Overview and Objectives

1. Data set of Retail Sales Transactions



Samir Saci

Senior Supply Chain Engineer — | Data Science for Supply Chain 📦, Sustainability 🌳 and Productivity ⌛

Recommended from Medium


See more recommendations