ABC analysis assumes that income-generating items in an inventory follow a Pareto distribution, where a very small percentage of items generate the most income. In this article, I’ll walk you through how we can perform ABC analysis with Machine Learning.
Conventions of ABC Analysis
Using the conventions of ABC analysis, an inventory item is assigned a letter based on its importance:
Also, Read – Moving Averages with Python.
- A articles represent 20% of articles, but contribute 70% of revenue
- B-articles represent 30% of articles, but contribute 25% of revenue
- C articles represent 50% of articles, but contribute 5% of revenue
Keep in mind that these numbers are approximate and will vary widely depending on the actual distribution of sales. The main takeaway is that A items make up a small percentage of inventory but contribute the most to income, C items make up a large percentage of inventory but contribute the least to income and B items are somewhere around leaves in the middle.
Importance of ABC Analysis
Inventory planning and warehousing strategies of an organization rely on ABC analysis to make any key decisions. For example, a warehouse manager typically wants A items closest to the shipping docks to reduce the time it takes to pick them up. This increases productivity and reduces labour costs.
ABC Analysis with Machine Learning
The data used in this project comes from a popular online retailer dataset. The dataset only includes online sales of clothing throughout the summer. More importantly, it shows the number of units sold and the price sold, which will generate the revenue per item. The dataset can be easily downloaded from here.
The goal of this project is to sort all the elements of the dataset into an ABC categorization based on their importance. When viewing the results, there should be relatively few A items that generate the majority of income and a large number of C items that do not generate much income.
Data Preparation
Now, let’s get started with this task with data preparation. I will start this off by importing the necessary packages and reading the dataset:
# Import libraries
import pandas as pd
import numpy as np
# read the data to a dataframe
df = pd.read_csv("Summer_Sales.csv")
Code language: PHP (php)
I will add a new column to the data for revenue by simply multiplying the number of units sold by the price. It is possible that the price has changed over time, especially when flash sales have taken place, but without additional data to analyze, it is assumed that all items sold at a single, stable price:
df["revenue"] = df["units_sold"] * df["price"]
Code language: JavaScript (javascript)
Now, lets visualize the revenue by using the seaborn package in python:
import seaborn as sns
sns.distplot(df["revenue"])
Code language: JavaScript (javascript)

The graph above shows the Pareto distribution found in the data. The vast majority of articles generate less than € 200,000 in sales. At the same time, it shows that some of the items sell for between € 400,000 and € 800,000, which is contributing in the majority of the revenue.
Now, I’m going to define a function to categorize the amount of income generated by an item into bins, and then I’ll apply it to the data:
def bins(x):
for bar in range(20000, 820000, 20000):
if x <= bar:
return bar
# Create new column to apply the bin function
df["rev_dist"] = df["revenue"].apply(lambda x: bins(x))
Code language: PHP (php)
Now I’m going to create a pivot table to list the number of items that fall into each category:
df["count"] = 1
# Create a pivot table of the revenue distributions
pivot_table = pd.pivot_table(df, index = ["rev_dist"], values = ["count"], aggfunc = np.sum)
Code language: PHP (php)
Applying Machine Learning Algorithm
To properly train the model, it is not enough to just look at the income generated by each item. He must also know how income is distributed. This pivot table provides a very manageable data set that the model can train on. I will use the K-Means Clustering algorithm for this task of ABC Analysis:
# import model from SKLearn
from sklearn.cluster import KMeans
# K -clusters is equal to 3 because things will be sorted into A, B, and C
kmeans = KMeans(n_clusters=3)
kmeans.fit(pivot_table)
Code language: PHP (php)
I will now add a new column to the pivot table giving the classification of the model. It should be noted that by default, scikit-learn’s K-means algorithm will rank items on a numeric scale instead of the alphabetical scale used in the ABC analysis. Therefore, each row will be labelled as zero, one, or two:
pivot_table["category"] = kmeans.labels_
Code language: JavaScript (javascript)
Now, I will define a new dictionary to classify each row for the task of ABC analysis:
ABC_dict = {
0: "A",
1: "C",
2: "B"
}
pivot_table["ABC"] = pivot_table["category"].apply(lambda x: ABC_dict[x])
Code language: JavaScript (javascript)
Now, remember that the model was trained on a pivot table. The elements have not yet been assigned an ABC classification. Instead, it was assigned an income classification:
df = pd.merge(df, pivot_table, on = "rev_dist", how ="left")
Code language: JavaScript (javascript)
This means that while we don’t immediately know which items fall into Category A, we do know that some income classifications are classified as A Items. As a result, we can just merge the main data frame and the PivotTable to give each item its ABC classification.
When analyzing the final distribution of the elements, it was found that:
- A-items represent 11.4% of articles, but 61.7% of turnover
- B-items represent 20.5% of items, but 30.7% of turnover
- C articles represent 68.1% of articles, but 7.6% of turnover
Also, Read – Edge AI in Machine Learning.
I hope you liked this article on ABC analysis with Machine Learning. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.