AI Advances

Democratizing access to artificial intelligence

Follow publication

How Important is Master Data Management for Data Science?

Samir Saci
AI Advances
Published in
8 min readFeb 15, 2025

--

Image by Author

Master Data Management (MDM) is the implementation of processes to provide consistent, complete, and accurate master data across a company's departments.

As a Data Scientist, how important a good master data is for your activities?

In my experience working in logistic operations, I have faced issues due to the inconsistency of master data impacting operations, analytics and strategic reporting.

Impact of Master Data along the Value Chain — (Image by Author)

To explain the importance of master data, I will share examples of key analytics products that can be severely impacted by inconsistent data.

These examples can be used to convince your top management to invest in resources for Master Data Management.

Master Data Management for a Fashion Retailer

Scenario

You are a data scientist in the supply chain department of an international clothing group that has stores all around the world.

This retailer produces garments, bags, and accessories in factories located in Asia.

Supply Chain Network managed with systems — (Image by Author)

Products are sold in stores that are replenished from local warehouses.

Different systems (IMS, WMS, ERP, TMS) record transactional data following products from the end of the production line to the store shelves.

How to ensure end-to-end tracking of the products across the different systems?

What is Master Data Management?

These products are identified by unique SKU IDs (Stock-Keeping Units) recorded in master data.

Master data is essential information about all items managed in the supply chain, serving as an organization's central source of truth.

In the portfolio of articles sold, you have two categories

  • Seasonal products: articles sold only during one collection (Spring/Summer, Fall/Winter)
  • Carry-over: permanent articles that are carried from one year to another
Items Information in Master Data — Supply Chain Management
Item Master Data Information — (Image by Author)

How do we introduce a new item in the master data?

During the creation process, master data specialists enter product-related information into the ERP.

  • Product information: net weight, dimensions, dangerous classification
  • Packaging: total weight, dimensions, language, etc.
  • Handling units: number of items per (carton, pallets), pallet height
  • Merchandising: supplier name, cost of purchase, location of listing (warehouses, stores), pricing per market, etc.

These pieces of information will be tied to the SKU ID across the different systems managing your supply chain, from factories to the stores.

How these items are recorded in the master data?

Data Entry Process

Most of the time, these parameters are manually entered into the ERP by Master Data Specialists.

  1. Item data are sent by the procurement team in an Excel template.
  2. MDM specialists check the templates to ensure they are complete and perform random checks to ensure the validity of the information.
  3. MDM specialists create the items in the ERP
Data Entry Process — (Image by Author)

Now that the items are recorded in the system,

  • Procurement teams can send purchase orders to the suppliers.
  • Logistics teams can receive the products in warehouses and stores.
  • Merchandising teams can follow the sales and stock levels in stores.
  • Analytics teams can include these new items in their reports.

What is the impact for you (as a Data Scientist)?

Problem Statement

As a Data Scientist, you are not involved in the item creation process.

However, flaws in the process can directly impact the performance of the analytics products you deploy in the organization.

What can go wrong if the process is not reliable?

In the next section, we will introduce real examples of issues directly impacting Data Science and operational teams.

Impacts on Supply Chain Analytics Products

As multiple business units in the company use it, inconsistent master data can greatly impact your end-to-end supply chain operations.

International Freight: Container Optimization

The logistics team is using sea freight to deliver factories from overseas factories.

As the freight costs exploded during the pandemic, your team has designed and deployed an algorithm to optimize the loading of pallets in a container.

Sea Containes during Custom Clearance
Containers Loading — (Image by Author)

The idea is to find the optimal loading sequence that maximizes the number of pallets loaded in a container.

Optimized Solution (Left) | Initial Solution (Right) — (Image by Author)

In the picture above, you can observe the difference between an intuitive loading strategy (right) and the optimized version (left).

For this specific example, the tool can load two additional pallets in the same container.

For more information about this algorithm,

What can go wrong with master data issues?

This tool uses the master data to estimate the dimensions, sizes, and weight of pallets based on the items in the pallet.

  • If the dimensions are incorrect, the optimized loading plan may not fit with the container dimensions.
    For instance, a pallet height can be higher than expected if the dimension of a single item in the master data is incorrect.
  • The total cargo weight may exceed the limitations if the item weights are wrong.
    For instance, if we load 50 boxes of a specific item and the weight per box in the master is 2kg lower than the actual weight, the full pallet will be 100 kg heavier than expected.
Example of a 2D Loading Plan Propose by the Algorithm — (Image by Author)

The consequences can be extremely costly if the operators cannot follow the loading plan.

Operations: Your tool underestimated the cargo volume!
Data Scientist: That’s not the tool; this is the master data.

Transportation teams may have to book additional containers (at a very high price) or keep your cargo in staging areas for days (which can cause delays in the delivery process).

Let us continue with another optimization algorithm focusing on picking operations at warehouses.

Warehouse Operations: Picking Route Optimization

In a warehouse, walking from one location to another during the picking route can account for 60% to 70% of the operator’s working time.

Picking routes examples — (Image by Author)

You designed and deployed an algorithm with a team of process engineers to minimize the walking distance of pickers.

The idea is to use spatial clustering to group orders by zones to minimize walking per wave.

Example of three Picking Locations Clusters — (Image by Author)

The tool would support the Warehouse Management System (WMS) in grouping orders by waves (limited to a specific zone) that would be allocated to operators.

For more information about this algorithm,

What can go happen if the dimensions in the master data are wrong?

Warehouse operators are using picking carts with limited volume and weight capacity.

Picking Carts from an E-commerce Warehouse — (Image by Author)

The algorithm uses the item master data to maximize the number of orders per wave.

  • If the dimensions in the master data are incorrect, the algorithm may create batches of orders that do not fit in the cart.
    The operator would then have to stop in the middle of the wave to unload the cart, reducing productivity.
  • If the weights in the master data are below reality, the algorithm may overload the cart.
    This will impact operators' productivity and may lead to work accidents.

This shows how issues in the master data can completely waste the efforts you put into developing an optimization tool.

What about strategic reporting?

Strategic Reporting: Supply Chain Sustainability

As the demand for transparency in sustainable development has grown over the years, the sustainability team asked for your support to automate the reporting of CO2 emissions.

The focus of this report is scope 3 indirect emissions occurring in the value chain of the company with a focus on Transportation.

Formula using Emission Factor — (Image by Author)
With,
E_CO2: emissions in kilograms of CO2 equivalent (kgCO2eq)
W_goods: weight of the goods (Ton)
D: distance from your warehouse to the final destination(km)
F_mode: emissions factor for each transportation mode (kgCO2eq/t.km)

This simple formula estimates the CO2 emissions if you transport W_goods (Tons) with a specific transportation mode at a distance of D (km).

The results are crucial to measuring the baseline of emissions that will drive your sustainability roadmap.

Data Collection & Processing — (Image by Author)

The input datasets include shipment records, address books and more important master data.

Indeed, item master data is used to convert quantities to weight to feed the formula that will estimate CO2 emissions.

To learn more about the automation of CO2 emissions reporting,

What if you have an incomplete master data?

The biggest challenges that you will face are incompleteness and inconsistency.

  • Incompleteness: you may find items with missing dimensions or weight because the factory (or purchasing team) did not completely fill out the input template.
  • Inconsistency: a data entry error can transform 12.2 k to 122 kg

The risks are to provide incomplete reports or overestimated emissions.

This is unacceptable for strategic reports like this, as they go through detailed audits before publication in the annual report.

Conclusion

This list of examples is not exhaustive, but it gives an overview of how operations can be impacted by failures in master data management.

Garbage in, garbage out. And no magic algorithm will change this.

You may invest resources in designing optimization tools to reduce costs, but the efforts can be ruined if the solution is fed with the wrong data.

Four types of analytics that will be impacted by bad data management — (Image by Author)

This is not a surprise to anyone; this article's objective is to focus on master data because this single table can impact the entire value chain.

The solution is to implement Data quality iniatives.

6 principles of data quality — (Image by Author)

Data Quality defines how your master data can be trusted, understood and utilized effectively for their intended purpose.

For more information about Data Quality

About Me

Let’s connect on LinkedIn and Twitter. I am a Supply Chain Engineer who uses data analytics to improve logistics operations and reduce costs.

If you are interested in data analytics and supply chain, please visit my website.

📘 Your complete guide for Supply Chain Analytics: Analytics Cheat Sheet

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in AI Advances

Democratizing access to artificial intelligence

Written by Samir Saci

Top Supply Chain Analytics Writer — Case studies using Data Science for Supply Chain Sustainability 🌳 and Productivity: https://bit.ly/supply-chain-cheat

No responses yet

Write a response