Retail Price Optimization using Python

Retail price optimization involves determining the optimal selling price for products or services to maximize revenue and profit. So, if you want to learn how to use machine learning for the retail price optimization task, this article is for you. In this article, I will walk you through the task of Retail Price Optimization with Machine Learning using Python.

What is Retail Price Optimization?

Optimizing retail prices means finding the perfect balance between the price you charge for your products and the number of products you can sell at that price.

The ultimate aim is to charge a price that helps you make the most money and attracts enough customers to buy your products. It involves using data and pricing strategies to find the right price that maximizes your sales and profits while keeping customers happy.

So for the task of Retail Price Optimization, you need data about the prices of products or services and everything that affects the price of a product. I found an ideal dataset for this task. You can download the data from here.

In the section below, I will take you through the task of Retail Price Optimization with Machine Learning using Python.

Retail Price Optimization using Python

Let’s start the task of Retail Price Optimization by importing the necessary Python libraries and the dataset:

import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
pio.templates.default = "plotly_white"

data = pd.read_csv('retail_price.csv')
print(data.head())
  product_id product_category_name  month_year  qty  total_price  \
0       bed1        bed_bath_table  01-05-2017    1        45.95   
1       bed1        bed_bath_table  01-06-2017    3       137.85   
2       bed1        bed_bath_table  01-07-2017    6       275.70   
3       bed1        bed_bath_table  01-08-2017    4       183.80   
4       bed1        bed_bath_table  01-09-2017    2        91.90   

   freight_price  unit_price  product_name_lenght  product_description_lenght  \
0      15.100000       45.95                   39                         161   
1      12.933333       45.95                   39                         161   
2      14.840000       45.95                   39                         161   
3      14.287500       45.95                   39                         161   
4      15.100000       45.95                   39                         161   

   product_photos_qty  ...  comp_1  ps1        fp1      comp_2  ps2  \
0                   2  ...    89.9  3.9  15.011897  215.000000  4.4   
1                   2  ...    89.9  3.9  14.769216  209.000000  4.4   
2                   2  ...    89.9  3.9  13.993833  205.000000  4.4   
3                   2  ...    89.9  3.9  14.656757  199.509804  4.4   
4                   2  ...    89.9  3.9  18.776522  163.398710  4.4   

         fp2  comp_3  ps3        fp3  lag_price  
0   8.760000   45.95  4.0  15.100000      45.90  
1  21.322000   45.95  4.0  12.933333      45.95  
2  22.195932   45.95  4.0  14.840000      45.95  
3  19.412885   45.95  4.0  14.287500      45.95  
4  24.324687   45.95  4.0  15.100000      45.95  

[5 rows x 30 columns]

Before moving forward, let’s have a look if the data has null values or not:

print(data.isnull().sum())
product_id                    0
product_category_name         0
month_year                    0
qty                           0
total_price                   0
freight_price                 0
unit_price                    0
product_name_lenght           0
product_description_lenght    0
product_photos_qty            0
product_weight_g              0
product_score                 0
customers                     0
weekday                       0
weekend                       0
holiday                       0
month                         0
year                          0
s                             0
volume                        0
comp_1                        0
ps1                           0
fp1                           0
comp_2                        0
ps2                           0
fp2                           0
comp_3                        0
ps3                           0
fp3                           0
lag_price                     0
dtype: int64

Now let’s have a look at the descriptive statistics of the data:

print(data.describe())
              qty   total_price  freight_price  unit_price  \
count  676.000000    676.000000     676.000000  676.000000   
mean    14.495562   1422.708728      20.682270  106.496800   
std     15.443421   1700.123100      10.081817   76.182972   
min      1.000000     19.900000       0.000000   19.900000   
25%      4.000000    333.700000      14.761912   53.900000   
50%     10.000000    807.890000      17.518472   89.900000   
75%     18.000000   1887.322500      22.713558  129.990000   
max    122.000000  12095.000000      79.760000  364.000000   

       product_name_lenght  product_description_lenght  product_photos_qty  \
count           676.000000                  676.000000          676.000000   
mean             48.720414                  767.399408            1.994083   
std               9.420715                  655.205015            1.420473   
min              29.000000                  100.000000            1.000000   
25%              40.000000                  339.000000            1.000000   
50%              51.000000                  501.000000            1.500000   
75%              57.000000                  903.000000            2.000000   
max              60.000000                 3006.000000            8.000000   

       product_weight_g  product_score   customers  ...      comp_1  \
count        676.000000     676.000000  676.000000  ...  676.000000   
mean        1847.498521       4.085503   81.028107  ...   79.452054   
std         2274.808483       0.232021   62.055560  ...   47.933358   
min          100.000000       3.300000    1.000000  ...   19.900000   
25%          348.000000       3.900000   34.000000  ...   49.910000   
50%          950.000000       4.100000   62.000000  ...   69.900000   
75%         1850.000000       4.200000  116.000000  ...  104.256549   
max         9750.000000       4.500000  339.000000  ...  349.900000   

              ps1         fp1      comp_2         ps2         fp2      comp_3  \
count  676.000000  676.000000  676.000000  676.000000  676.000000  676.000000   
mean     4.159467   18.597610   92.930079    4.123521   18.620644   84.182642   
std      0.121652    9.406537   49.481269    0.207189    6.424174   47.745789   
min      3.700000    0.095439   19.900000    3.300000    4.410000   19.900000   
25%      4.100000   13.826429   53.900000    4.100000   14.485000   53.785714   
50%      4.200000   16.618984   89.990000    4.200000   16.811765   59.900000   
75%      4.200000   19.732500  117.888889    4.200000   21.665238   99.990000   
max      4.500000   57.230000  349.900000    4.400000   57.230000  255.610000   

              ps3         fp3   lag_price  
count  676.000000  676.000000  676.000000  
mean     4.002071   17.965007  107.399684  
std      0.233292    5.533256   76.974657  
min      3.500000    7.670000   19.850000  
25%      3.900000   15.042727   55.668750  
50%      4.000000   16.517110   89.900000  
75%      4.100000   19.447778  129.990000  
max      4.400000   57.230000  364.000000  

[8 rows x 27 columns]

Now let’s have a look at the distribution of the prices of the products:

fig = px.histogram(data, 
                   x='total_price', 
                   nbins=20, 
                   title='Distribution of Total Price')
fig.show()
retail price optimization: Distribution of Total Price

Now let’s have a look at the distribution of the unit prices using a box plot:

fig = px.box(data, 
             y='unit_price', 
             title='Box Plot of Unit Price')
fig.show()
Box Plot of Unit Prices

Now let’s have a look at the relationship between quantity and total prices:

fig = px.scatter(data, 
                 x='qty', 
                 y='total_price', 
                 title='Quantity vs Total Price', trendline="ols")
fig.show()
retail price optimization: Quantity vs Total Price

Thus, the relationship between quantity and total prices is linear. It indicates that the price structure is based on a fixed unit price, where the total price is calculated by multiplying the quantity by the unit price.

Now let’s have a look at the average total prices by product categories:

fig = px.bar(data, x='product_category_name', 
             y='total_price', 
             title='Average Total Price by Product Category')
fig.show()
Average Total Prices by Product Category

Now let’s have a look at the distribution of total prices by weekday using a box plot:

fig = px.box(data, x='weekday', 
             y='total_price', 
             title='Box Plot of Total Price by Weekday')
fig.show()
retail price optimization: Box Plot of Total Price by Weekday

Now let’s have a look at the distribution of total prices by holiday using a box plot:

fig = px.box(data, x='holiday', 
             y='total_price', 
             title='Box Plot of Total Price by Holiday')
fig.show()
Box Plot of Total Prices by Holiday

Now let’s have a look at the correlation between the numerical features with each other:

correlation_matrix = data.corr()
fig = go.Figure(go.Heatmap(x=correlation_matrix.columns, 
                           y=correlation_matrix.columns, 
                           z=correlation_matrix.values))
fig.update_layout(title='Correlation Heatmap of Numerical Features')
fig.show()
retail price optimization: Correlation Heatmap of Numerical Features

Analyzing competitors’ pricing strategies is essential in optimizing retail prices. Monitoring and benchmarking against competitors’ prices can help identify opportunities to price competitively, either by pricing below or above the competition, depending on the retailer’s positioning and strategy. Now let’s calculate the average competitor price difference by product category:

data['comp_price_diff'] = data['unit_price'] - data['comp_1'] 

avg_price_diff_by_category = data.groupby('product_category_name')['comp_price_diff'].mean().reset_index()

fig = px.bar(avg_price_diff_by_category, 
             x='product_category_name', 
             y='comp_price_diff', 
             title='Average Competitor Price Difference by Product Category')
fig.update_layout(
    xaxis_title='Product Category',
    yaxis_title='Average Competitor Price Difference'
)
fig.show()
Average Competitor Prices Difference by Product Category

Retail Price Optimization Model with Machine Learning

Now let’s train a Machine Learning model for the task of Retail Price Optimization. Below is how we can train a Machine Learning model for this problem:

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeRegressor
from sklearn.metrics import mean_squared_error

X = data[['qty', 'unit_price', 'comp_1', 
          'product_score', 'comp_price_diff']]
y = data['total_price']

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2,
                                                    random_state=42)

# Train a linear regression model
model = DecisionTreeRegressor()
model.fit(X_train, y_train)

Now let’s make predictions and have a look at the predicted retail prices and the actual retail prices:

y_pred = model.predict(X_test)

fig = go.Figure()
fig.add_trace(go.Scatter(x=y_test, y=y_pred, mode='markers', 
                         marker=dict(color='blue'), 
                         name='Predicted vs. Actual Retail Price'))
fig.add_trace(go.Scatter(x=[min(y_test), max(y_test)], y=[min(y_test), max(y_test)], 
                         mode='lines', 
                         marker=dict(color='red'), 
                         name='Ideal Prediction'))
fig.update_layout(
    title='Predicted vs. Actual Retail Price',
    xaxis_title='Actual Retail Price',
    yaxis_title='Predicted Retail Price'
)
fig.show()
Predicted vs. Actual Retail Price

So this is how you can optimize retail prices with Machine Learning using Python.

Summary

The ultimate aim of optimizing retail prices is to charge a price that helps you make the most money and attracts enough customers to buy your products. It involves using data and pricing strategies to find the right price that maximizes your sales and profits while keeping customers happy. I hope you liked this article on optimizing retail prices with Machine Learning using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1435

Leave a Reply