Student marks prediction is a popular data science case study based on the problem of regression. It is a good regression problem for data science beginners as it is easy to solve and understand. So if you want to learn how to predict the marks of a student with machine learning, this article is for you. In this article, I will take you through the task of student marks prediction with machine learning using Python.
Student Marks Prediction (Case Study)
You are given some information about students like:
- the number of courses they have opted for
- the average time studied per day by students
- marks obtained by students
By using this information, you need to predict the marks of other students. You can download the dataset from here.
Student Marks Prediction using Python
The dataset I am using for the student marks prediction task is downloaded from Kaggle. Now let’s start with this task by importing the necessary Python libraries and dataset:
import numpy as np import pandas as pd import plotly.express as px from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression data = pd.read_csv("Student_Marks.csv") print(data.head(10))
number_courses time_study Marks 0 3 4.508 19.202 1 4 0.096 7.734 2 4 3.133 13.811 3 6 7.909 53.018 4 8 7.811 55.299 5 6 3.211 17.822 6 3 6.063 29.889 7 5 3.413 17.264 8 4 4.410 20.348 9 3 6.173 30.862
So there are only three columns in the dataset. The marks column is the target column as we have to predict the marks of a student.
Now before moving forward, let’s have a look at whether this dataset contains any null values or not:
print(data.isnull().sum())
number_courses 0 time_study 0 Marks 0 dtype: int64
The dataset is ready to use because there are no null values in the data. There is a column in the data containing information about the number of courses students have chosen. Let’s look at the number of values of all values of this column:
data["number_courses"].value_counts()
3 22 4 21 6 16 8 16 7 15 5 10 Name: number_courses, dtype: int64
So there are a minimum of three and a maximum of eight courses students have chosen. Let’s have a look at a scatter plot to see whether the number of courses affects the marks of a student:
figure = px.scatter(data_frame=data, x = "number_courses", y = "Marks", size = "time_study", title="Number of Courses and Marks Scored") figure.show()

According to the above data visualization, we can say that the number of courses may not affect the marks of a student if the student is studying for more time daily. So let’s have a look at the relationship between the time a studied daily and the marks scored by the student:
figure = px.scatter(data_frame=data, x = "time_study", y = "Marks", size = "number_courses", title="Time Spent and Marks Scored", trendline="ols") figure.show()

You can see that there is a linear relationship between the time studied and the marks obtained. This means the more time students spend studying, the better they can score.
Now let’s have a look at the correlation between the marks scored by the students and the other two columns in the data:
correlation = data.corr() print(correlation["Marks"].sort_values(ascending=False))
Marks 1.000000 time_study 0.942254 number_courses 0.417335 Name: Marks, dtype: float64
So the time_studied column is more correlated with the marks column than the other column.
Student Marks Prediction Model
Now let’s move to the task of training a machine learning model for predicting the marks of a student. Here, I will first start by splitting the data into training and test sets:
x = np.array(data[["time_study", "number_courses"]]) y = np.array(data["Marks"]) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
Now I will train a machine learning model using the linear regression algorithm:
model = LinearRegression() model.fit(xtrain, ytrain) model.score(xtest, ytest)
0.9459936100591212
Now let’s test the performance of this machine learning model by giving inputs based on the features we have used to train the model and predict the marks of a student:
# Features = [["time_study", "number_courses"]] features = np.array([[4.508, 3]]) model.predict(features)
array([22.30738483])
So this is how you can predict the marks of a student with machine learning using Python.
Summary
So this is how you can solve the problem of student marks prediction with machine learning. It is a good regression problem for data science beginners as it is easy to solve and understand. I hope you liked this article on Student marks prediction with machine learning using Python. Feel free to ask valuable questions in the comments section below.
Thank you so much for the great insights into ML. I am really learning from it
keep visiting 😃