Covid-19 is one of the deadliest viruses you’ve ever heard. Mutations in covid-19 make it either more deadly or more infectious. We have seen a lot of deaths from covid-19 while there is a higher wave of cases. We can use historical data on covid-19 cases and deaths to predict the number of deaths in the future. So if you want to learn how to predict covid-19 deaths with machine learning, this article is for you. In this article, I will take you through the task of Covid-19 deaths prediction with machine learning using Python.
Covid-19 Deaths Prediction (Case Study)
You are given a dataset of Covid-19 in India from 30 January 2020 to 18 January 2022. The dataset contains information about the daily confirmed cases and deaths. Below are all the columns of the dataset:
- Date: Contains the date of the record
- Date_YMD: Contains date in Year-Month-Day Format
- Daily Confirmed: Contains the daily confirmed cases of Covid-19
- Daily Deceased: Contains the daily deaths due to Covid-19
You need to use this historical data of covid-19 cases and deaths to predict the number of deaths for the next 30 days. You can download this dataset from here.
Covid-19 Deaths Prediction using Python
I hope you now have understood the problem statement mentioned above. Now I will import all the necessary Python libraries and the dataset we need for the task of covid-19 deaths prediction:
import pandas as pd import numpy as np data = pd.read_csv("COVID19 data for overall INDIA.csv") print(data.head())
Date Date_YMD Daily Confirmed Daily Deceased 0 30 January 2020 2020-01-30 1 0 1 31 January 2020 2020-01-31 0 0 2 1 February 2020 2020-02-01 0 0 3 2 February 2020 2020-02-02 1 0 4 3 February 2020 2020-02-03 1 0
Before moving forward, let’s have a quick look at whether this dataset contains any null values or not:
Date 0 Date_YMD 0 Daily Confirmed 0 Daily Deceased 0 dtype: int64
We don’t need the date column, so let’s drop this column from our dataset:
data = data.drop("Date", axis=1)
Let’s have a look at the daily confirmed cases of Covid-19:
import plotly.express as px fig = px.bar(data, x='Date_YMD', y='Daily Confirmed') fig.show()
In the data visualization above, we can see a high wave of covid-19 cases between April 2021 and May 2021.
Covid-19 Death Rate Analysis
Now let’s visualize the death rate due to Covid-19:
cases = data["Daily Confirmed"].sum() deceased = data["Daily Deceased"].sum() labels = ["Confirmed", "Deceased"] values = [cases, deceased] fig = px.pie(data, values=values, names=labels, title='Daily Confirmed Cases vs Daily Deaths', hole=0.5) fig.show()
Let’s calculate the death rate of Covid-19:
death_rate = (data["Daily Deceased"].sum() / data["Daily Confirmed"].sum()) * 100 print(death_rate)
Now let’s have a look at the daily deaths of covid-19:
import plotly.express as px fig = px.bar(data, x='Date_YMD', y='Daily Deceased') fig.show()
We can see a high number of deaths during the high wave of covid-19 cases.
Covid-19 Deaths Prediction Model
Now let’s move to the task of covid-19 deaths prediction for the next 30 days. Here I will be using the AutoTS library, which is one of the best Automatic Machine Learning libraries for Time Series Analysis. If you have never used this library before, you can install it by executing the command mentioned below in your command prompt or terminal:
- pip install autots
Now here’s how to predict covid-19 deaths with machine learning for the next 30 days:
from autots import AutoTS model = AutoTS(forecast_length=30, frequency='infer', ensemble='simple') model = model.fit(data, date_col="Date_YMD", value_col='Daily Deceased', id_col=None) prediction = model.predict() forecast = prediction.forecast print(forecast)
Daily Deceased 2022-01-19 271.950000 2022-01-20 310.179787 2022-01-21 297.500000 2022-01-22 310.179787 2022-01-23 271.950000 2022-01-24 258.518302 2022-01-25 340.355520 2022-01-26 296.561343 2022-01-27 296.561343 2022-01-28 284.438262 2022-01-29 323.400000 2022-01-30 271.950000 2022-01-31 245.750000 2022-02-01 284.438262 2022-02-02 258.518302 2022-02-03 239.969607 2022-02-04 271.950000 2022-02-05 334.118953 2022-02-06 323.400000 2022-02-07 271.950000 2022-02-08 284.438262 2022-02-09 323.400000 2022-02-10 258.518302 2022-02-11 245.750000 2022-02-12 245.750000 2022-02-13 326.442185 2022-02-14 323.400000 2022-02-15 394.343619 2022-02-16 228.117431 2022-02-17 358.200000
So this is how we can predict covid-19 deaths with machine learning using the Python programming language. We can use the historical data of covid-19 cases and deaths to predict the number of deaths in future. You can implement the same method for predicting covid-19 deaths and waves on the latest dataset. I hope you liked this article on covid-19 deaths prediction with machine learning. Feel free to ask valuable questions in the comments section below.