Lean Six Sigma with Python — Chi-Squared Test

Perform a Chi-Squared Test to explain a shortage of drivers impacting your transportation network.

Samir Saci
6 min readOct 30, 2021
Diagram illustrating the driver allocation problem in a transportation network. A factory sends goods to two warehouses — North and South — via an ERP system. Both warehouses are served by a transport company, which allocates drivers (D1, D2, D3). Black arrows represent balanced driver allocations to the North Warehouse, while red arrows indicate an unbalanced allocation to the South Warehouse, causing delays. This introduce Lean Six Sigma with Python using Chi-Squared Test.
Solve a Driver Allocation Problem with Chi-Squared Test — (Image by Author)

In the logistics industry, improving your operations constantly and adapting to changes is essential.

Lean Six Sigma (LSS) with Python is a stepwise approach to process improvements using statistical tools to test hypotheses.

As a Data Scientist, how can you solve operational issues with LSS?

In a previous article, we used the Kruskal-Wallis Test to verify the hypothesis that specific training improves warehouse productivity.

Which tool can we use to help transportation operations?

In this article, we will implement the Chi-Squared Test with Python to understand if transportation delays are due to an inadequate allocation of drivers.

SUMMARY
I. Problem Statement
Transportation delays are due to drivers' allocation issues?
II. Exploratory Data Analysis
1. Exploratory Data Analysis

Analysis with Python sample data from historical records
2. Perform Cross Tabulation
Summarise the relationship between several categorical variables.
3. Pearson’s Chi-Square Test
Validate that your results are significant and not due to random fluctuation
III. Conclusion
1. Generative AI: Lean Six Sigma x GPT
2. Next Steps

Problem Statement

Addressing Transportation Delays with Chi-Squared Test

You are a data scientist at a manufacturing company.

You support the Inbound Transportation Manager of a small factory in the United States.

The transportation network is simple, with only two routes

  • Route 1: coming from your northern regional hub (with difficult road conditions and heavy traffic)
  • Route 2: coming from your southern regional hub (with no traffic and a beautiful modern road)

Transportation is managed by an external service provider with a fleet of three trucks (with three different drivers: D1, D2, D3).

Diagram showing the replenishment order process and driver allocation in a transportation network. A factory sends orders through an ERP system to two regional warehouses, North and South. From the warehouses, a transport company delivers goods using one of three drivers (D1, D2, D3). The diagram highlights the flow of goods from the factory to the warehouse and then to the transport company, illustrating the decision process for driver allocation that will be assessed using lean six sigma.
Replenishment order process from the request of the factory to driver allocation — (Image by Author)

Can we assess routes performance using data analytics?

Understanding Driver Allocation in Transportation Network

When a factory needs replenishment,

  1. The Factory sends a replenishment order to your ERP
  2. The Southern regional hub receives the order first
  3. If the stock in the southern hub is too low, then the order is transferred to the northern hub
  4. ERP sends a pick-up request to the transportation service provider (From the Selected Hub to the Factory)
  5. The first driver accepting the request is delivering the raw materials to the factory

P.S.: As a customer, we do not have any visibility on the process of driver allocation.

Problem
When an order is allocated to the northern regional hub, the lead time to get the request accepted is 35% longer than that of the southern hub.

Are there drivers avoiding as much as possible to be allocated to the north route?

Experiment
We have analyzed the shipments of the last 18 months to build a sample of 269 records.

The objective is to use statistical tools with Python to answer this question.

Exploratory Data Analysis

Stacked Bar Charts to Visualize Driver Allocation

Let’s have a look at the driver's allocation.

A stacked bar chart visualizing the allocation of drivers between the North and South hubs. Each bar represents a different driver, with the blue segment showing allocations to the North hub and the orange segment showing allocations to the South hub. The chart reveals that a larger proportion of deliveries are allocated to the South hub, suggesting an imbalance in driver assignments between the two regions that will be assessed using Lean Six Sigma Chi-squared test with Python.
Stacked Bart Charts — (Image by Author)

Can you see something significant?

Cross Tabulation: Analyzing Shipments by Hub and Driver

A cross-tabulation of the data can provide some insights and help us discover a potential pattern in the repartition of driver’s allocation.

Split of shipments by HUB for each driver
Split of shipments (%) per Driver for each HUB

Example
82.65 % of shipments handled by Driver 1 are from SOUTH HUB

Split of shipments (%) per HUB for each Driver

Example
38.89 % of shipments from SOUTH HUB are handled by Driver 1

Minitab
Menu Stats> Tables > Cross Tabulation and Chi-Square

We can see an unbalanced allocation.

What’s next?

Pearson’s Chi-Squared Test: Evaluating the Significance of Driver Allocation

The first table is called also called a Contingency table. It is used in statistics to summarise the relationship between several categorical variables.

Can we prove statistically that the hub impacts the allocation?

Using the Chi-Squared Test, we calculate the significance factor to determine whether the relationship between the variables is significant.

p-value is 0.410

Conclusion
Because the p-value >0.05, there is no significant proof that the driver’s allocation is linked to the Hub.

Code

Based on the patterns in the data, we don’t have proof that drivers prefer a specific route.

We need to investigate more to find the root cause of our delays.

Conclusion

By applying the Chi-Squared Test in Python, we could determine that driver allocation was not the root cause of the transportation delays.

This data-driven approach helped us identify areas for further investigation to find the actual cause of the problem.

Can we solve other operational issues using the same approach?

Yes!

The articles below will help you learn about the Lean Six Sigma methodology applied to warehouse operations.

Have you heard about Generative AI?

Generative AI: Lean Six Sigma GPT Agent

After the recent adoption of Large Language Models (LLMs) like GPT, we can enhance the user experience of analytics products with smart agents.

In this article, I shared my first experiment, the design of a LangChain Agent connected to a TMS.

A diagram showing an automated supply chain control tower workflow with GPT and Langchain starting with ambiguous input (represented by question marks), proceeding through SQL queries, machine learning analysis, and generating insights that are communicated to users in an understandable form.
Supply Chain Control Tower Agent with LangChain SQL Agent [Article Link] — (Image by Author)

The outputs are impressive.

We have an agent that can answer operational questions by querying a database autonomously.

What if we create a Lean Six Sigma super agent?

My objective is to equip a GPT agent with

  • Python Scripts of Lean Six Sigma Tools
  • Context, articles and knowledge about LSS mathematical tools
The image shows agent architecture to process the user’s request for two different analyses simultaneously. After receiving detailed instructions, the agent complete both tasks, prompting the user to guide it further. The flow illustrates how the agent resolves complex requests efficiently to promote Supply Chain Analytics with custom GPTs like “The Supply Chain Analyst”.
Lean Six Sigma Super Agent — (Image by Author)

So we have an agent that can find the proper test, perform it on data uploaded by users and provide an answer.

For more information,

About Me

Let’s connect on Linkedin and Twitter. I am a Supply Chain Engineer who uses data analytics to improve logistics operations and reduce costs.

For consulting or advice on analytics and sustainable supply chain transformation, feel free to contact me via Logigreen Consulting.

If you are interested in Data Analytics and Supply Chain, look at my website.

💌 New articles straight in your inbox for free: Newsletter
📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

If you prefer to watch, have a look at the video version of this article

References

  • Pearson’s Chi-Squared Test, geeks for geeks, link

--

--

Samir Saci

Top Supply Chain Analytics Writer — Follow my journey using Data Science for Supply Chain Sustainability 🌳 and Productivity ⌛ https://samirsaci.com/about