In this article, I’m going to introduce you to a data science project on online shopping intention analysis with Python. The growing popularity of online shopping has led to the emergence of new economic activities. To be successful in a highly competitive eCommerce environment, it is essential to understand customers’ online purchase intent.
Introduction to Online Shopping Intention Analysis
In recent years, e-commerce has brought huge benefits to suppliers and consumers. Defined as the use of the Internet to sell products or services to individual consumers, e-commerce has profoundly changed the way people conduct their business.
Indeed, it has become an important full-fledged transaction channel. A recent survey of online shopping predicted that the total amount of direct sales to customers will exceed $ 240 billion by 2007. Major technological innovations in online shopping have changed transaction channels in the information age.
With the growth of online shopping, it has become important to understand the factors that influence a consumer’s intention to buy from a website rather than just browse. This emerging topic is of interest to both academics and machine learning practitioners.
Online Shopping Intention Analysis with Python
In this section, I will take you through a Data Science Project on Online Shopping Intention analysis with Python. I will start with this task by importing the necessary libraries and the data:
Now let’s have a look at the missing values and fill them by using the fillna method in Python pandas:
missing = data.isnull().sum() print(missing)
Administrative 14 Administrative_Duration 14 Informational 14 Informational_Duration 14 ProductRelated 14 ProductRelated_Duration 14 BounceRates 14 ExitRates 14 PageValues 0 SpecialDay 0 Month 0 OperatingSystems 0 Browser 0 Region 0 TrafficType 0 VisitorType 0 Weekend 0 Revenue 0 dtype: int64
data.fillna(0, inplace = True)
Now have a look at product related bounce rates of customers:
x = data.iloc[:, [5, 6]].values x.shape
Now let’s apply the K-elbow method to determine the number of clustering groups:
K Means Clustering
According to the graph above, the maximum curvature is at the second index, that is, the number of optimal clustering groups for the duration of the product and the bounce rates is 2. Once the number of clusterings determined, we apply the K Means method and plot the clusters:
Looking at this K Means grouping plot, we can say with certainty that customers who spent more time on a product-related website are very less likely to leave the website after viewing a single page.
Since K-Means is not a supervised learning method, we are adopting other ways of evaluating its clustering result. The leftmost column of the confusion matrix represents the actual label (True or False revenue), and the top row represents the expected clustering groups (uninterested customers or target customers):
Observations From Above Plots:
From the confusion matrix, we can see that out of 10,422 failed incomes, 9,769 are grouped into uninterested customers or 94%. However, out of 937 successful incomes, only 284 are grouped as target customers or 15%. Also, the adjusted index score is not very high.
So it is clear that we have poorly bundled many successful revenue sessions as uninterested customers, which means when the high bounce rate combined with a short product-related page duration, there are still a lot of customers. targets.
I hope you liked this article on Data Science Project on Online Shopping Intention Analysis with Python programming language. Feel free to ask your valuable questions in the comments section below.