Pfizer Vaccine Sentiment Analysis using Python

Twitter is one of the most popular social media apps where people are free to share their opinions on any topic. There are so many tweets recorded about Pfizer vaccine awareness that can be used to analyze the sentiments of people about the Pfizer vaccine. So, if you want to learn how to use a Twitter dataset for sentiment analysis, this article is for you. In this article, I will walk you through the task of Pfizer vaccine sentiment analysis using Python.

Pfizer Vaccine Sentiment Analysis using Python

The dataset that I am using for the task of Pfizer vaccine sentiment analysis is downloaded from Kaggle, which was initially collected from Twitter when people were sharing their opinions about the Pfizer vaccine. Let’s start the task of Pfizer vaccine sentiment analysis by importing the necessary Python libraries and the dataset:

                    id             user_name  ... favorites is_retweet
0  1340539111971516416            Rachel Roh  ...         0      False
1  1338158543359250433           Albert Fong  ...         1      False
2  1337858199140118533              eli🇱🇹🇪🇺👌  ...         0      False
3  1337855739918835717         Charles Adler  ...      2129      False
4  1337854064604966912  Citizen News Channel  ...         0      False

[5 rows x 16 columns]

So this dataset is quite large, let’s have a look at whether it contains any null values or not:

data.isnull().sum()
id                     0
user_name              0
user_location       1630
user_description     506
user_created           0
user_followers         0
user_friends           0
user_favourites        0
user_verified          0
date                   0
text                   0
hashtags            1949
source                 1
retweets               0
favorites              0
is_retweet             0
dtype: int64

Although these null values will not affect the task of sentiment analysis, to keep things simple, I will drop the rows containing null values as the dataset is already large:

data = data.dropna()

Now let’s have a look at the descriptive statistics of this dataset:

print(data.describe())
                 id  user_followers  ...     retweets    favorites
count  4.749000e+03    4.749000e+03  ...  4749.000000  4749.000000
mean   1.355333e+18    5.069683e+04  ...     1.545378     9.385555
std    1.280104e+16    3.545440e+05  ...    13.395572    55.280915
min    1.337728e+18    0.000000e+00  ...     0.000000     0.000000
25%    1.344929e+18    1.740000e+02  ...     0.000000     0.000000
50%    1.352030e+18    6.480000e+02  ...     0.000000     1.000000
75%    1.364940e+18    2.728000e+03  ...     1.000000     5.000000
max    1.384788e+18    1.371493e+07  ...   678.000000  1979.000000

[8 rows x 6 columns]

The text column is the most important feature in this dataset as it contains the opinions of the users of Twitter about the Pfizer vaccine. But the text column needs to be prepared as it contains many special symbols and language errors. Below is how we can clean the text column:

Now, let’s have a look at the word cloud of the text column. A word cloud is a data visualisation technique that shows the most used words in large font and the less used words in small font. Here is how you can visualize the word cloud of the text column:

Pfizer Vaccine Sentiment Analysis

Now let’s have a look at the word cloud of the hashtags column, which can show what kind of tags was trending when people were sharing their opinions about the Pfizer vaccine:

word cloud

The “user_verified” column in the dataset shows whether the users who have shared their opinions are verified by Twitter or not. A verified user on Twitter is a public figure or a celebrity. So let’s have a look at how many users were verified who shared their opinions about the Pfizer vaccine:

data["user_verified"].value_counts()
False    4169
True      580
Name: user_verified, dtype: int64

In the above output, False shows the count of unverified users and True shows the count of verified users. Now let’s move to the task of sentiment analysis of the Pfizer vaccine. Here I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the text column:

                                                 text  ...  Neutral
0   folk said daikon past could treat cytokin stor...  ...    0.748
2   coronavirus sputnikv astrazeneca pfizerbiontec...  ...    1.000
6   bit sad claim fame success vaccin patriot comp...  ...    0.481
9   covidvaccin state start get  monday us say pak...  ...    1.000
10  death close  mark million peopl wait pfizerbio...  ...    0.698

[5 rows x 4 columns]

Now let’s calculate how most of the people felt about the Pfizer vaccine:

Neutral 🙂 

So most of the opinions of the users were Neutral, let’s have a look at the total of each sentiment score before making any conclusion:

print("Positive: ", x)
print("Negative: ", y)
print("Neutral: ", z)
Positive:  417.81600000000003
Negative:  188.81200000000024
Neutral:  4142.3750000000055

The total of positive and negative is very less than Neutral, so we can say that the discussion of the Twitter users was about the awareness of the Pfizer vaccine rather than sharing its benefits or drawbacks.

Summary

So this is how you can analyze the sentiments of Twitter users about the Pfizer vaccine. In the conclusion of this sentiment analysis, I can just say that the discussion of the Twitter users was about the awareness of the Pfizer vaccine rather than sharing its benefits or drawbacks. I hope you liked this article on Pfizer vaccine sentiment analysis using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1433

Leave a Reply