B2B Courier Charges Accuracy Analysis using Python

In today’s fast-paced e-commerce industry, fast and efficient order delivery is crucial to business success. To ensure seamless order fulfilment, businesses often partner with courier companies to ship their products to customers. However, managing the charges collected by these courier companies can be difficult, especially when dealing with a high volume of orders. It is one of the real-time problems B2B businesses experience when their estimated charges for the same invoice don’t match. In this article, I will take you through a solution to one such problem statement based on B2B Courier Charges Accuracy Analysis using Python.

B2B Courier Charges Accuracy Analysis

B2B courier charges accuracy analysis focuses on assessing the accuracy of fees charged by courier companies for the delivery of goods in B2B transactions. The aim is to ensure that companies are billed appropriately for the services provided by courier companies.

It is one of the problems that will help you use most of your skills in working with data. You can find the dataset I will use to solve this problem along with the complete problem statement here.

Now in the section below, I’ll take you through the task of B2B Courier Charges Accuracy Analysis using the Python programming language.

B2B Courier Charges Accuracy Analysis using Python

Let’s start this task by importing the necessary Python libraries and the dataset:

```import pandas as pd

order_report = pd.read_csv('Order Report.csv')
sku_master = pd.read_csv('SKU Master.csv')
courier_company_rates = pd.read_csv('Courier Company - Rates.csv')

print("Order Report:")
print("\nSKU Master:")
print("\nPincode Mapping:")
print("\nCourier Invoice:")
print("\nCourier Company rates:")
```Order Report:
ExternOrderNo            SKU  Order Qty  Unnamed: 3  Unnamed: 4
0     2001827036  8904223818706        1.0         NaN         NaN
1     2001827036  8904223819093        1.0         NaN         NaN
2     2001827036  8904223819109        1.0         NaN         NaN
3     2001827036  8904223818430        1.0         NaN         NaN
4     2001827036  8904223819277        1.0         NaN         NaN

SKU Master:
SKU  Weight (g)  Unnamed: 2  Unnamed: 3  Unnamed: 4
0  8904223815682         210         NaN         NaN         NaN
1  8904223815859         165         NaN         NaN         NaN
2  8904223815866         113         NaN         NaN         NaN
3  8904223815873          65         NaN         NaN         NaN
4  8904223816214         120         NaN         NaN         NaN

Pincode Mapping:
Warehouse Pincode  Customer Pincode Zone  Unnamed: 3  Unnamed: 4
0             121003            507101    d         NaN         NaN
1             121003            486886    d         NaN         NaN
2             121003            532484    d         NaN         NaN
3             121003            143001    b         NaN         NaN
4             121003            515591    d         NaN         NaN

Courier Invoice:
AWB Code    Order ID  Charged Weight  Warehouse Pincode  \
0  1091117222124  2001806232            1.30             121003
1  1091117222194  2001806273            1.00             121003
2  1091117222931  2001806408            2.50             121003
3  1091117223244  2001806458            1.00             121003
4  1091117229345  2001807012            0.15             121003

Customer Pincode Zone Type of Shipment  Billing Amount (Rs.)
0            507101    d  Forward charges                 135.0
1            486886    d  Forward charges                  90.2
2            532484    d  Forward charges                 224.6
3            143001    b  Forward charges                  61.3
4            515591    d  Forward charges                  45.4

Courier Company rates:
0         29.5              23.6           33              28.3         40.1

0              38.9         45.4              44.8         56.6

0              55.5         13.6              23.6         20.5

0              28.3         31.9              38.9         41.3

0              44.8         50.7              55.5  ```

Now let’s have a look if any of the data contains missing values:

```# Check for missing values
print("\nMissing values in Website Order Report:")
print(order_report.isnull().sum())
print("\nMissing values in SKU Master:")
print(sku_master.isnull().sum())
print("\nMissing values in Pincode Mapping:")
print(pincode_mapping.isnull().sum())
print("\nMissing values in Courier Invoice:")
print(courier_invoice.isnull().sum())
print("\nMissing values in courier company rates:")
print(courier_company_rates.isnull().sum())```
```Missing values in Website Order Report:
ExternOrderNo      0
SKU                0
Order Qty          0
Unnamed: 3       400
Unnamed: 4       400
dtype: int64

Missing values in SKU Master:
SKU            0
Weight (g)     0
Unnamed: 2    66
Unnamed: 3    66
Unnamed: 4    66
dtype: int64

Missing values in Pincode Mapping:
Warehouse Pincode      0
Customer Pincode       0
Zone                   0
Unnamed: 3           124
Unnamed: 4           124
dtype: int64

Missing values in Courier Invoice:
AWB Code                0
Order ID                0
Charged Weight          0
Warehouse Pincode       0
Customer Pincode        0
Zone                    0
Type of Shipment        0
Billing Amount (Rs.)    0
dtype: int64

Missing values in courier company rates:
fwd_a_fixed         0
fwd_b_fixed         0
fwd_c_fixed         0
fwd_d_fixed         0
fwd_e_fixed         0
rto_a_fixed         0
rto_b_fixed         0
rto_c_fixed         0
rto_d_fixed         0
rto_e_fixed         0
dtype: int64```

Now let’s clean the data:

```# Remove unnamed columns from the Website Order Report DataFrame
order_report = order_report.drop(columns=['Unnamed: 3', 'Unnamed: 4'])

# Remove unnamed columns from the SKU Master DataFrame
sku_master = sku_master.drop(columns=['Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4'])

# Remove unnamed columns from the Pincode Mapping DataFrame
pincode_mapping = pincode_mapping.drop(columns=['Unnamed: 3', 'Unnamed: 4'])```

Now let’s merge the order report and SKU master datasets according to the common SKU column:

```# Merge the Order Report and SKU Master based on SKU
merged_data = pd.merge(order_report, sku_master, on='SKU')
```     Order ID            SKU  Order Qty  Weight (g)
0  2001827036  8904223818706        1.0         127
1  2001821995  8904223818706        1.0         127
2  2001819252  8904223818706        1.0         127
3  2001816996  8904223818706        1.0         127
4  2001814580  8904223818706        1.0         127```

The ‘ExternOrderNo’ is nothing but ‘Order Id’ in other datasets. Let’s rename this column:

```# Rename the "ExternOrderNo" column to "Order ID" in the merged_data DataFrame
merged_data = merged_data.rename(columns={'ExternOrderNo': 'Order ID'})```

Now let’s merge the courier invoice and pincode mapping dataset:

```abc_courier = pincode_mapping.drop_duplicates(subset=['Customer Pincode'])
courier_abc= courier_invoice[['Order ID', 'Customer Pincode','Type of Shipment']]
pincodes= courier_abc.merge(abc_courier,on='Customer Pincode')
```     Order ID  Customer Pincode Type of Shipment  Warehouse Pincode Zone
0  2001806232            507101  Forward charges             121003    d
1  2001806273            486886  Forward charges             121003    d
2  2001806408            532484  Forward charges             121003    d
3  2001806458            143001  Forward charges             121003    b
4  2001807012            515591  Forward charges             121003    d```

Below is how the above code works:

1. We first extract the unique customer pin codes from the pincode mapping dataset and create a new DataFrame called “abc_courier” to store this information.
2. We then select specific columns (“Order ID”, “Customer Pincode”, “Type of Shipment”) from the courier_invoice dataset and create a new DataFrame called “courier_abc” to store this subset of data.
3. We then merge the ‘courier_abc’ DataFrame with the ‘abc_courier’ DataFrame based on the ‘Customer Pincode’ column. This merge operation helps us associate customer pin codes with their respective orders and shipping types. The resulting DataFrame is named ‘pincodes’.

Now let’s merge the pin codes with the main dataframe:

`merged2 = merged_data.merge(pincodes, on='Order ID')`

Now let’s calculate the weight in kilograms by dividing the ‘Weight (g)’ column in the ‘merged2’ DataFrame by 1000:

`merged2['Weights (Kgs)'] = merged2['Weight (g)'] / 1000`

Now let’s calculate the weight slabs:

```def weight_slab(weight):
i = round(weight % 1, 1)
if i == 0.0:
return weight
elif i > 0.5:
return int(weight) + 1.0
else:
return int(weight) + 0.5

merged2['Weight Slab (KG)'] = merged2['Weights (Kgs)'].apply(weight_slab)
courier_invoice['Weight Slab Charged by Courier Company']=(courier_invoice['Charged Weight']).apply(weight_slab)```

The weight_slab() function is defined to determine the weight slab based on the weight of the shipment. It takes the input weight and applies certain conditions to calculate the weight slab. Below is how it works:

1. The function first calculates the remainder of the weight divided by 1 and rounds it to one decimal place. If the remainder is 0.0, it means the weight is a multiple of 1 KG, and the function returns the weight as it is.
2. If the remainder is greater than 0.5, it means that the weight exceeds the next half KG slab. In this case, the function rounds the weight to the nearest integer and adds 1.0 to it, which represents the next heavier slab.
3. If the remainder is less than or equal to 0.5, it means the weight falls into the current half-KG bracket. In this case, the function rounds the weight to the nearest integer and adds 0.5 to it, which represents the current weight slab.

Now let’s rename the columns to prepare the desired dataframe:

```courier_invoice = courier_invoice.rename(columns={'Zone': 'Delivery Zone Charged by Courier Company'})
merged2 = merged2.rename(columns={'Zone': 'Delivery Zone As Per ABC'})
merged2 = merged2.rename(columns={'Weight Slab (KG)': 'Weight Slab As Per ABC'})```

Now let’s calculate the expected charges:

```total_expected_charge = []

for _, row in merged2.iterrows():
fwd_category = 'fwd_' + row['Delivery Zone As Per ABC']
fwd_fixed = courier_company_rates.at[0, fwd_category + '_fixed']
rto_category = 'rto_' + row['Delivery Zone As Per ABC']
rto_fixed = courier_company_rates.at[0, rto_category + '_fixed']

weight_slab = row['Weight Slab As Per ABC']

if row['Type of Shipment'] == 'Forward charges':
additional_weight = max(0, (weight_slab - 0.5) / 0.5)
elif row['Type of Shipment'] == 'Forward and RTO charges':
additional_weight = max(0, (weight_slab - 0.5) / 0.5)
else:
total_expected_charge.append(0)

merged2['Expected Charge as per ABC'] = total_expected_charge
```     Order ID            SKU  Order Qty  Weight (g)  Customer Pincode  \
0  2001827036  8904223818706        1.0         127            173213
1  2001827036  8904223819093        1.0         150            173213
2  2001827036  8904223819109        1.0         100            173213
3  2001827036  8904223818430        1.0         165            173213
4  2001827036  8904223819277        1.0         350            173213

Type of Shipment  Warehouse Pincode Delivery Zone As Per ABC  Weights (Kgs)  \
0  Forward charges             121003                        e          0.127
1  Forward charges             121003                        e          0.150
2  Forward charges             121003                        e          0.100
3  Forward charges             121003                        e          0.165
4  Forward charges             121003                        e          0.350

Weight Slab As Per ABC  Expected Charge as per ABC
0                     0.5                        56.6
1                     0.5                        56.6
2                     0.5                        56.6
3                     0.5                        56.6
4                     0.5                        56.6  ```

Below is how the above code works:

1. In this code, we loop through each row of the ‘merged2’ DataFrame to calculate the expected charges based on ABC’s tariffs. We retrieve the necessary rates and parameters, such as fixed charges and surcharges per weight tier for forward and RTO shipments, based on the delivery area.
2. We then determine the weight of the slab for each row. If the shipment type is ‘Forward Charges’, we calculate the additional weight beyond the basic weight slab (0.5 KG) and apply the corresponding additional charges. For “Forward and RTO Charges” shipments, we consider additional charges for term and RTO components.
3. Finally, we store the calculated expected charges in the “Expected charges according to ABC” column of the “merged2” DataFrame. This allows us to compare the expected charges with the charges billed to analyze the accuracy of the courier company’s charges.

Now let’s merge it with the courier invoice to display the final dataframe:

```merged_output = merged2.merge(courier_invoice, on='Order ID')
```     Order ID            SKU  Order Qty  Weight (g)  Customer Pincode_x  \
0  2001827036  8904223818706        1.0         127              173213
1  2001827036  8904223819093        1.0         150              173213
2  2001827036  8904223819109        1.0         100              173213
3  2001827036  8904223818430        1.0         165              173213
4  2001827036  8904223819277        1.0         350              173213

Type of Shipment_x  Warehouse Pincode_x Delivery Zone As Per ABC  \
0    Forward charges               121003                        e
1    Forward charges               121003                        e
2    Forward charges               121003                        e
3    Forward charges               121003                        e
4    Forward charges               121003                        e

Weights (Kgs)  Weight Slab As Per ABC  Expected Charge as per ABC  \
0          0.127                     0.5                        56.6
1          0.150                     0.5                        56.6
2          0.100                     0.5                        56.6
3          0.165                     0.5                        56.6
4          0.350                     0.5                        56.6

AWB Code  Charged Weight  Warehouse Pincode_y  Customer Pincode_y  \
0  1091122418320             1.6               121003              173213
1  1091122418320             1.6               121003              173213
2  1091122418320             1.6               121003              173213
3  1091122418320             1.6               121003              173213
4  1091122418320             1.6               121003              173213

Delivery Zone Charged by Courier Company Type of Shipment_y  \
0                                        b    Forward charges
1                                        b    Forward charges
2                                        b    Forward charges
3                                        b    Forward charges
4                                        b    Forward charges

Billing Amount (Rs.)  Weight Slab Charged by Courier Company
0                 117.9                                     2.0
1                 117.9                                     2.0
2                 117.9                                     2.0
3                 117.9                                     2.0
4                 117.9                                     2.0  ```

Now let’s calculate the differences in charges and expected charges for each order:

```df_diff = merged_output
df_diff['Difference (Rs.)'] = df_diff['Billing Amount (Rs.)'] - df_diff['Expected Charge as per ABC']

df_new = df_diff[['Order ID', 'Difference (Rs.)', 'Expected Charge as per ABC']]

```     Order ID  Difference (Rs.)  Expected Charge as per ABC
0  2001827036              61.3                        56.6
1  2001827036              61.3                        56.6
2  2001827036              61.3                        56.6
3  2001827036              61.3                        56.6
4  2001827036              61.3                        56.6```

Now let’s summarize the accuracy of B2B courier charges based on the charged prices and expected prices:

```# Calculate the total orders in each category
total_correctly_charged = len(df_new[df_new['Difference (Rs.)'] == 0])
total_overcharged = len(df_new[df_new['Difference (Rs.)'] > 0])
total_undercharged = len(df_new[df_new['Difference (Rs.)'] < 0])

# Calculate the total amount in each category
amount_overcharged = abs(df_new[df_new['Difference (Rs.)'] > 0]['Difference (Rs.)'].sum())
amount_undercharged = df_new[df_new['Difference (Rs.)'] < 0]['Difference (Rs.)'].sum()
amount_correctly_charged = df_new[df_new['Difference (Rs.)'] == 0]['Expected Charge as per ABC'].sum()

# Create a new DataFrame for the summary
summary_data = {'Description': ['Total Orders where ABC has been correctly charged',
'Total Orders where ABC has been overcharged',
'Total Orders where ABC has been undercharged'],
'Count': [total_correctly_charged, total_overcharged, total_undercharged],
'Amount (Rs.)': [amount_correctly_charged, amount_overcharged, amount_undercharged]}

df_summary = pd.DataFrame(summary_data)

print(df_summary)```
```                                         Description  Count  Amount (Rs.)
0  Total Orders where ABC has been correctly charged     12         507.6
1        Total Orders where ABC has been overcharged    382       33750.5
2       Total Orders where ABC has been undercharged      7        -165.2```

We can also visualize the proportion of errors as shown below:

```import plotly.graph_objects as go
fig = go.Figure(data=go.Pie(labels=df_summary['Description'],
values=df_summary['Count'],
textinfo='label+percent',
hole=0.4))
fig.update_layout(title='Proportion')

fig.show()```

So this is how you can analyze the accuracy of B2B courier charges using the Python programming language.

Summary

B2B courier charges accuracy analysis focuses on assessing the accuracy of fees charged by courier companies for the delivery of goods in B2B transactions. In such problems, we aim to ensure that companies are billed appropriately for the services provided by courier companies. I hope you liked this article on B2B Courier Charges Accuracy Analysis using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of dataðŸ“ˆ.

Articles: 1435