Tipping waiters for serving food depends on many factors like the type of restaurant, how many people you are with, how much amount you pay as your bill, etc. Waiter Tips analysis is one of the popular data science case studies where we need to predict the tips given to a waiter for serving the food in a restaurant. So if you want to learn how to solve this case study, this article is for you. In this article, I will take you through the task of waiter tips prediction with machine learning using Python.
Waiter Tips (Case Study)
The food server of a restaurant recorded data about the tips given to the waiters for serving the food. The data recorded by the food server is as follows:
- total_bill: Total bill in dollars including taxes
- tip: Tip given to waiters in dollars
- sex: gender of the person paying the bill
- smoker: whether the person smoked or not
- day: day of the week
- time: lunch or dinner
- size: number of people in a table
So this is the data recorded by the restaurant. Based on this data, our task is to find the factors affecting waiter tips and train a machine learning model to predict the waiter’s tipping.
Waiter Tips Prediction using Python
Now let’s start the task of waiter tips analysis and prediction by importing the necessary Python libraries and the dataset:
import pandas as pd import numpy as np import plotly.express as px import plotly.graph_objects as go data = pd.read_csv("tips.csv") print(data.head())
total_bill tip sex smoker day time size 0 16.99 1.01 Female No Sun Dinner 2 1 10.34 1.66 Male No Sun Dinner 3 2 21.01 3.50 Male No Sun Dinner 3 3 23.68 3.31 Male No Sun Dinner 2 4 24.59 3.61 Female No Sun Dinner 4
Below is the complete description of this dataset:
- total_bill: Total bill in dollars including tax
- tip: Tip given to waiter in dollars
- sex: gender of the person paying the bill
- smoker: whether the person smoked or not
- day: day of the week
- time: lunch or dinner
- size: number of people
Now let’s move forward by analyzing all the factors affecting waiter tips.
Waiter Tips Analysis
Let’s have a look at the tips given to the waiters according to:
- the total bill paid
- number of people at a table
- and the day of the week:
figure = px.scatter(data_frame = data, x="total_bill", y="tip", size="size", color= "day", trendline="ols") figure.show()

Now let’s have a look at the tips given to the waiters according to:
- the total bill paid
- the number of people at a table
- and the gender of the person paying the bill:
figure = px.scatter(data_frame = data, x="total_bill", y="tip", size="size", color= "sex", trendline="ols") figure.show()

Now let’s have a look at the tips given to the waiters according to:
- the total bill paid
- the number of people at a table
- and the time of the meal:
figure = px.scatter(data_frame = data, x="total_bill", y="tip", size="size", color= "time", trendline="ols") figure.show()

Now let’s see the tips given to the waiters according to the days to find out which day the most tips are given to the waiters:
figure = px.pie(data, values='tip', names='day',hole = 0.5) figure.show()

According to the visualization above, on Saturdays, most tips are given to the waiters. Now let’s look at the number of tips given to waiters by gender of the person paying the bill to see who tips waiters the most:
figure = px.pie(data, values='tip', names='sex',hole = 0.5) figure.show()

According to the visualization above, most tips are given by men. Now let’s see if a smoker tips more or a non-smoker:
figure = px.pie(data, values='tip', names='smoker',hole = 0.5) figure.show()

According to the visualization above, non-smoker tips waiters more than smokers. Now let’s see if most tips are given during lunch or dinner:
figure = px.pie(data, values='tip', names='time',hole = 0.5) figure.show()

According to the visualization above, a waiter is tipped more during dinner.
So this is how we can analyze all the factors affecting waiter tips. Now in the section below, I will take you through how to train a machine learning model for the task of waiter tips prediction.
Waiter Tips Prediction Model
Before training a waiter tips prediction model, I will do some data transformation by transforming the categorical values into numerical values:
data["sex"] = data["sex"].map({"Female": 0, "Male": 1}) data["smoker"] = data["smoker"].map({"No": 0, "Yes": 1}) data["day"] = data["day"].map({"Thur": 0, "Fri": 1, "Sat": 2, "Sun": 3}) data["time"] = data["time"].map({"Lunch": 0, "Dinner": 1}) data.head()
total_bill tip sex smoker day time size 0 16.99 1.01 0 0 3 1 2 1 10.34 1.66 1 0 3 1 3 2 21.01 3.50 1 0 3 1 3 3 23.68 3.31 1 0 3 1 2 4 24.59 3.61 0 0 3 1 4
Now I will split the data into training and test sets:
x = np.array(data[["total_bill", "sex", "smoker", "day", "time", "size"]]) y = np.array(data["tip"]) from sklearn.model_selection import train_test_split xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
Now below is how we can train a machine learning model for the task of waiter tips prediction using Python:
from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(xtrain, ytrain)
Now let’s test the performance of this model by giving inputs to this model according to the features that we have used to train this model:
# features = [[total_bill, "sex", "smoker", "day", "time", "size"]] features = np.array([[24.50, 1, 0, 0, 1, 4]]) model.predict(features)
array([3.73742609])
Summary
So this is how you can predict waiter tips with machine learning using Python. Waiter Tips analysis is one of the popular data science case studies where we need to predict the tips given to a waiter for serving the food in a restaurant. I hope you liked this article on waiter tips prediction with machine learning using Python. Feel free to ask valuable questions in the comments section below.
Now let’s have a look at the tips given to the waiters according to:
the total bill paid
the number of people at a table
and the time of the meal:
1 figure = px.scatter(data_frame = data, x=”total_bill”,
2 y=”tip”, size=”size”, color= “sex”, trendline=”ols”)
3 figure.show()
The color=”time”
Kindly update that, it might create a confusion.
Thanks
yes, I just noticed! thanks
How do i get this data please, its not downloadable..
You can download the dataset from here: https://www.kaggle.com/datasets/aminizahra/tips-dataset
The accuracy of this model comes out to be 45%.
Can you suggest some ways of improving it.
I am kind of new to this field and want to make a career in it.
It will be very helpful for me, if you could answer my question