Predicting the number of orders for a product is one of the strategies a business can follow in determining how much to invest in marketing their product. So, predicting the number of orders is an important data science use case for product-based companies. If you want to know how to use machine learning for the number of orders prediction, this article is for you. In this article, I will walk you through the task of the number of orders prediction with machine learning using Python.

## Number of Orders Prediction

If you want to predict the number of orders a company may receive for a particular product, then you need to have historical data about the number of orders received by the company. So for this task, I will be using the sales data of supplements that have been collected from Kaggle. The data that I will be using for the task of the number of orders prediction contains data about:

- Product ID
- Store ID
- The type of store where the supplement was sold
- The type of location the order was received from
- Sales Date
- Region code
- Whether it is a public holiday or not at the time of order
- Whether the product was on discount or not
- Number of orders placed
- Sales

I hope you have now got an overview of the problem and the dataset I will be using to solve the problem. Now in the section below, I will take you through the task of the number of orders prediction with machine learning by using the Python programming language.

## Number of Orders Prediction using Python

Letâ€™s start the task of the number of orders prediction by importing the necessary Python libraries and the dataset:

import pandas as pd import numpy as np data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/supplement.csv") data.head()

ID Store_id Store_Type ... Discount #Order Sales 0 T1000001 1 S1 ... Yes 9 7011.84 1 T1000002 253 S4 ... Yes 60 51789.12 2 T1000003 252 S3 ... Yes 42 36868.20 3 T1000004 251 S2 ... Yes 23 19715.16 4 T1000005 250 S2 ... Yes 62 45614.52 [5 rows x 10 columns]

Now letâ€™s have a look at some of the necessary insights from this dataset to know about what kind of dataset we are working with:

data.info()

<class 'pandas.core.frame.DataFrame'> RangeIndex: 188340 entries, 0 to 188339 Data columns (total 10 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 188340 non-null object 1 Store_id 188340 non-null int64 2 Store_Type 188340 non-null object 3 Location_Type 188340 non-null object 4 Region_Code 188340 non-null object 5 Date 188340 non-null object 6 Holiday 188340 non-null int64 7 Discount 188340 non-null object 8 #Order 188340 non-null int64 9 Sales 188340 non-null float64 dtypes: float64(1), int64(3), object(6) memory usage: 14.4+ MB

data.isnull().sum()

ID 0 Store_id 0 Store_Type 0 Location_Type 0 Region_Code 0 Date 0 Holiday 0 Discount 0 #Order 0 Sales 0 dtype: int64

data.describe()

Store_id Holiday #Order Sales count 188340.000000 188340.000000 188340.000000 188340.000000 mean 183.000000 0.131783 68.205692 42784.327982 std 105.366308 0.338256 30.467415 18456.708302 min 1.000000 0.000000 0.000000 0.000000 25% 92.000000 0.000000 48.000000 30426.000000 50% 183.000000 0.000000 63.000000 39678.000000 75% 274.000000 0.000000 82.000000 51909.000000 max 365.000000 1.000000 371.000000 247215.000000

Now letâ€™s explore some of the important features from this dataset to know about the factors affecting the number of orders for supplements:

The above figure shows the distribution of the number of orders received according to the store type. Now letâ€™s have a look at the distribution of the number of orders, according to the location:

The above figure shows the distribution of the number of orders received according to the location. Now letâ€™s have a look at the distribution of the number of orders, according to the discount:

According to the above figure, most people still buy supplements if there is no discount on them. Now letâ€™s have a look at how holidays affect the number of orders:

According to the above figure, most of the people buy supplements in working days.Â

## Number of Orders Prediction Model

Now letâ€™s prepare the data so that we can train a machine learning model for the task of the number of orders prediction. Here, I will change some of the string values to numerical values:

Now letâ€™s split the data into 80% training set and 20% test set:

Now I will be using the light gradient boosting regression algorithm to train the model:

LGBMRegressor(boosting_type='gbdt', class_weight=None, colsample_bytree=1.0, importance_type='split', learning_rate=0.1, max_depth=-1, min_child_samples=20, min_child_weight=0.001, min_split_gain=0.0, n_estimators=100, n_jobs=-1, num_leaves=31, objective=None, random_state=None, reg_alpha=0.0, reg_lambda=0.0, silent=True, subsample=1.0, subsample_for_bin=200000, subsample_freq=0)

Now letâ€™s have a look at the predicted values:

ypred = model.predict(xtest) data = pd.DataFrame(data={"Predicted Orders": ypred.flatten()}) print(data.head())

Predicted Orders 0 47.351897 1 97.068717 2 66.577788 3 85.143083 4 54.451098

So this is how you can train a machine learning model for the task of the number of orders prediction by using the Python programming language.

### Summary

Predicting the number of orders of a product is one of the strategies a product based company can follow for determining how much they should invest in the marketing of their product. I hope you liked this article on the task of the number of orders prediction with machine learning using **Python**. Feel free to ask your valuable questions in the comments section below.