# Supermarket Sales Analysis with Data Science

Supermarket Sales Analysis with Python

A supermarket is self-service shop offering a wide variety of food, beverages and household products, organized into sections. It is larger and has a wider selection than earlier grocery stores, but is smaller and more limited in the range of merchandise than a hypermarket or big-box market.

### What will you discover from this analysis?

1.Relation of customers with SuperMarket
2.Payment methods used in supermarket.
3.Products relation with quantities.
4.Types of product and their sales.
5.Products and their ratings.

Let’s start by importing Libraries

```import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns```

You can download the data set you need for this project from here:

```data=pd.read_csv("market.csv")
print(data.shape)```

#Output
(1000, 17)

`data.head()`

### Data Cleaning

`data.isnull().sum()`
```#Output
Invoice ID                 0
Branch                     0
City                       0
Customer type              0
Gender                     0
Product line               0
Unit price                 0
Quantity                   0
Tax 5%                     0
Total                      0
Date                       0
Time                       0
Payment                    0
cogs                       0
gross margin percentage    0
gross income               0
Rating                     0
dtype: int64```

There are no missing value and the data set is clean so we will continue with data visualization.

### Checking information of data set.

`data.info()`
```#Output
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 17 columns):
#   Column                   Non-Null Count  Dtype
---  ------                   --------------  -----
0   Invoice ID               1000 non-null   object
1   Branch                   1000 non-null   object
2   City                     1000 non-null   object
3   Customer type            1000 non-null   object
4   Gender                   1000 non-null   object
5   Product line             1000 non-null   object
6   Unit price               1000 non-null   float64
7   Quantity                 1000 non-null   int64
8   Tax 5%                   1000 non-null   float64
9   Total                    1000 non-null   float64
10  Date                     1000 non-null   object
11  Time                     1000 non-null   object
12  Payment                  1000 non-null   object
13  cogs                     1000 non-null   float64
14  gross margin percentage  1000 non-null   float64
15  gross income             1000 non-null   float64
16  Rating                   1000 non-null   float64
dtypes: float64(7), int64(1), object(9)
memory usage: 132.9+ KB```
`data.describe()`

### Checking number of rows and columns

`print("Dataset contains {} row and {} colums".format(data.shape,data.shape))`
```#Output
Dataset contains 1000 row and 17 colums```

## Visualization

Now we use different visualization tools to check different aspects of Supermarket sales.

```plt.figure(figsize=(14,6))
plt.style.use('fivethirtyeight')
ax= sns.countplot('Gender', data=data , palette = 'copper')
ax.set_xlabel(xlabel= "Gender",fontsize=18)
ax.set_ylabel(ylabel = "Gender count", fontsize = 18)
ax.set_title(label = "Gender count in supermarket", fontsize = 20)
plt.show()```

Here we can see that the number of males and females entering the store is almost equal. But the visualization looks suspicious. Let’s check numeric data.

`data.groupby(['Gender']). agg({'Total':'sum'})`
```#Output
Total
Gender
Female	167882.925
Male	155083.824```

The visualization looks good. Let’s carry on.

### Customer type

```plt.style.use('ggplot')
plt.figure(figsize= (14,6))
ax = sns.countplot(x = "Customer type", data = data, palette = "rocket_r")
ax.set_title("Type of customers", fontsize = 25)
ax.set_xlabel("Customer type", fontsize = 16)
ax.set_ylabel("Customer Count", fontsize = 16)```

The visualization looks suspicious let’s check numeric data.

`data.groupby(['Customer type']). agg({'Total':'sum'})`
```#Output
Total
Customer type
Member	        164223.444
Normal	        158743.305```

Above we can see the type of customer in all branch combined now let’s check for different branch.

```plt.figure(figsize=(14,6))
plt.style.use('classic')
ax = sns.countplot(x = "Customer type", hue = "Branch", data = data, palette= "rocket_r")
ax.set_title(label = "Customer type in different branch", fontsize = 25)
ax.set_xlabel(xlabel = "Branches", fontsize = 16)
ax.set_ylabel(ylabel = "Customer Count", fontsize = 16)```

### Checking the different payment methods used.

```plt.figure(figsize = (14,6))
ax = sns.countplot(x = "Payment", data = data, palette = "tab20")
ax.set_title(label = "Payment methods of customers ", fontsize= 25)
ax.set_xlabel(xlabel = "Payment method", fontsize = 16)
ax.set_ylabel(ylabel = " Customer Count", fontsize = 16)```

### Payment method distribution in all branches

```plt.figure(figsize = (14,6))
plt.style.use('classic')
ax = sns.countplot(x="Payment", hue = "Branch", data = data, palette= "tab20")
ax.set_title(label = "Payment distribution in all branches", fontsize= 25)
ax.set_xlabel(xlabel = "Payment method", fontsize = 16)
ax.set_ylabel(ylabel = "Peple Count", fontsize = 16)```

### Now let’s see the rating distribution in 3 branches

```plt.figure(figsize=(14,6))
ax = sns.boxplot(x="Branch", y = "Rating" ,data =data, palette= "RdYlBu")
ax.set_title("Rating distribution between branches", fontsize = 25)
ax.set_xlabel(xlabel = "Branches", fontsize = 16)
ax.set_ylabel(ylabel = "Rating distribution", fontsize = 16)```

We can see that the average rating of branch A and C is more than seven and branch B is less than 7.

### Max sales time

```data["Time"]= pd.to_datetime(data["Time"])
data["Hour"]= (data["Time"]).dt.hour
plt.figure(figsize=(14,6))
plt.style.use('classic')
SalesTime = sns.lineplot(x="Hour", y ="Quantity", data = data).set_title("product sales per Hour")```

We can see that the supermarket makes most of it’s sells in 14:00 hrs local time.

## Rating vs sales

```plt.figure(figsize=(14,6))
plt.style.use('classic')
rating_vs_sales = sns.lineplot(x="Total", y= "Rating", data=data)```

### Using boxen plot

```plt.figure(figsize=(10,6))
plt.style.use('classic')
ax = sns.boxenplot(x = "Quantity", y = "Product line", data = data,)
ax.set_title(label = "Average sales of different lines of products", fontsize = 25)
ax.set_xlabel(xlabel = "Qunatity Sales",fontsize = 16)
ax.set_ylabel(ylabel = "Product Line", fontsize = 16)```

Here we can see that the average sales of different lines of products. Health and beauty making the highest sales whereas Fashon accessories making the lowest sales.

### Let’s see the sales count of these products.

```plt.figure(figsize=(14,6))
ax = sns.countplot(y='Product line', data=data, order = data['Product line'].value_counts().index)
ax.set_title(label = "Sales count of products", fontsize = 25)
ax.set_xlabel(xlabel = "Sales count", fontsize = 16)
ax.set_ylabel(ylabel= "Product Line", fontsize = 16)```

We can see the top sold products form the above figure.

### Total sales of product using boxenplot

```plt.figure(figsize=(14,6))
plt.style.use('classic')
ax = sns.boxenplot(y= "Product line", x= "Total", data = data)
ax.set_title(label = " Total sales of product", fontsize = 25)
ax.set_xlabel(xlabel = "Total sales", fontsize = 16)
ax.set_ylabel(ylabel = "Product Line", fontsize = 16)```

### Now let’s see average ratings of products.

```plt.figure(figsize = (14,6))
plt.style.use('classic')
ax = sns.boxenplot(y = "Product line", x = "Rating", data = data)
ax.set_title("Average rating of product line", fontsize = 25)
ax.set_xlabel("Rating", fontsize = 16)
ax.set_ylabel("Product line", fontsize = 16)```

### Product sales on the basis of gender

```plt.style.use('classic')
plt.figure(figsize = (14,6))
ax= sns.stripplot(y= "Product line", x = "Total", hue = "Gender", data = data)
ax.set_title(label = "Product sales on the basis of gender")
ax.set_xlabel(xlabel = " Total sales of products")
ax.set_ylabel(ylabel = "Product Line")```

### Product and gross income

```plt.figure(figsize = (14,6))
plt.style.use('classic')
ax = sns.relplot(y= "Product line", x = "gross income", data = data)
# ax.set_title(label = "Products and Gross income")
# ax.set_xlabel(xlabel = "Total gross income")
# ax.set_ylabel(ylabel = "Product line")```