Lean Six Sigma with Python — Chi-Squared Test

Perform a Chi-Squared Test to explain a shortage of drivers impacting your transportation network

Samir Saci
5 min readOct 30, 2021
Lean Six Sigma with Python — Chi-Squared Test for Driver Allocation Problem
Solve a Driver Allocation Problem with Chi-Squared Test — (Image by Author)

As a logistics professional, it’s essential to improve your operations constantly.

That’s where Lean Six Sigma with Python comes in. Follow our step-by-step guide to optimize your supply chain management.


Lean Six Sigma is defined as a stepwise approach to process improvements.

In a previous article, we used the Kruskal-Wallis Test to verify the hypothesis that specific training positively impacts operators' Inbound Value-Added Services (VAS) productivity.

In this article, we will implement the Chi-Squared Test with Python to understand if transportation delays are due to a bad allocation of drivers.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

I. Problem Statement
Transportation delays are due to drivers' allocation issues?
II. Data Analysis
1. Exploratory Data Analysis

Analysis with Python sample data from historical records
2. Perform Cross Tabulation
Summarise the relationship between several categorical variables.
3. Pearson’s Chi-Square Test
Validate that your results are significant and not due to random fluctuation
III. Conclusion
1. Generative AI: Lean Six Sigma x GPT
2. Next Steps

If you prefer to watch, have a look at the video version of this article

I. Problem Statement

Addressing Transportation Delays with Chi-Squared Test

You are the Inbound Transportation Manager of a small factory in the United States.

Your transportation network is simple; you have two routes:

  • Route 1: coming from your northern regional hub (with difficult road conditions and heavy traffic)
  • Route 2: coming from your southern regional hub (with no traffic and a beautiful modern road)

Transportation is managed by an external service provider with a fleet of three trucks (with three different drivers: D1, D2, D3).

Driver Allocation Problem with Chi-Squared Test using Python
Replenishment order process from the request of the factory to driver allocation — (Image by Author)

Replenishment Process: Understanding Driver Allocation in Transportation Network

  1. The Factory sends a replenishment order to your ERP
  2. The Southern regional hub receives the order first
  3. If the stock in the southern hub is too low, then the order is transferred to the northern hub
  4. ERP sends a pick-up request to the transportation service provider (From the Selected Hub to the Factory)
  5. The first driver accepting the request is delivering the raw materials to the factory

P.S.: As a customer, we do not have any visibility on the process of driver allocation.

When an order is allocated to the northern regional hub the lead time to get the request accepted is 35% higher than the southern hub.

Are there drivers avoiding as much as possible to be allocated to the north route?

We have analyzed the shipments of the last 18 months to build a sample of 269 records.

💡 Follow me on Medium for more articles related to 🏭 Supply Chain Analytics, 🌳 Sustainability and 🕜 Productivity.

II. Data Analysis

Exploratory Data Analysis: Stacked Bar Charts to Visualize Driver Allocation

Stacked Bart Charts — (Image by Author)

Cross Tabulation: Analyzing Shipments by Hub and Driver

A cross-tabulation of the data can provide some insights and help us to discover a potential pattern in the repartition of driver’s allocation.

Split of shipments by HUB for each driver
Split of shipments (%) per Driver for each HUB

82.65 % of shipments handled by Driver 1 are from SOUTH HUB

Split of shipments (%) per HUB for each Driver

38.89 % of shipments from SOUTH HUB are handled by Driver 1

Menu Stats> Tables > Cross Tabulation and Chi-Square

Pearson’s Chi-Squared Test: Evaluating the Significance of Driver Allocation

The first table is called also called a Contingency table. It is used in statistics to summarise the relationship between several categorical variables.

Using the Chi-Squared Test, we’ll calculate the significance factor to determine whether the relation between the variables is of considerable significance.

p-value is 0.410

Because the p-value >0.05, there is no significant proof that the driver’s allocation is linked to the Hub.


If you are interested in other applications of Lean Six Sigma Methodology using Python, you can have a look at the articles below:

III. Conclusion

Generative AI: Lean Six Sigma GPT Agent

After the recent adoption of Large Language Models (LLMs) like GPT, we can enhance the user experience of analytics products with smart agents.

I shared my first experiment in this article, which was the design of a LangChain Agent connected to a TMS.

Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] — (Image by Author)

The outputs are impressive, as we have an agent that can answer operational questions by querying a database autonomously.

What if we create a Lean Six Sigma super agent?

Lean Six Sigma Super Agent — (Image by Author)

My objective is to equip a GPT agent with

  • Python Scripts of Lean Six Sigma Tools
  • Context, articles and knowledge about LSS mathematical tools

So we have an agent that can find the right test, perform it on data uploaded by users and provide an answer.

For more information,

2. Next Steps

By applying the Chi-Squared Test in Python, we could determine that driver allocation was not the root cause of the transportation delays.

This data-driven approach helped us identify areas for further investigation to find the actual cause of the problem.

Therefore, leveraging Lean Six Sigma methodologies with Python can significantly improve productivity and efficiency.

Stay tuned for more data-driven solutions to optimize your operations and reduce costs.

About Me

Let’s connect on Linkedin and Twitter, I am a Supply Chain Engineer using data analytics to improve logistics operations and reduce costs.

If you are interested in Data Analytics and Supply Chain, have a look at my website




Samir Saci

Top Supply Chain Analytics Writer — Follow my journey using Data Science for Supply Chain Sustainability 🌳 and Productivity ⌛