Twitter is one of the most popular social media apps where people are free to share their opinions on any topic. There are so many tweets recorded about Pfizer vaccine awareness that can be used to analyze the sentiments of people about the Pfizer vaccine. So, if you want to learn how to use a Twitter dataset for sentiment analysis, this article is for you. In this article, I will walk you through the task of Pfizer vaccine sentiment analysis using Python.
Pfizer Vaccine Sentiment Analysis using Python
The dataset that I am using for the task of Pfizer vaccine sentiment analysis is downloaded from Kaggle, which was initially collected from Twitter when people were sharing their opinions about the Pfizer vaccine. Let’s start the task of Pfizer vaccine sentiment analysis by importing the necessary Python libraries and the dataset:
id user_name ... favorites is_retweet 0 1340539111971516416 Rachel Roh ... 0 False 1 1338158543359250433 Albert Fong ... 1 False 2 1337858199140118533 eli🇱🇹🇪🇺👌 ... 0 False 3 1337855739918835717 Charles Adler ... 2129 False 4 1337854064604966912 Citizen News Channel ... 0 False [5 rows x 16 columns]
So this dataset is quite large, let’s have a look at whether it contains any null values or not:
id 0 user_name 0 user_location 1630 user_description 506 user_created 0 user_followers 0 user_friends 0 user_favourites 0 user_verified 0 date 0 text 0 hashtags 1949 source 1 retweets 0 favorites 0 is_retweet 0 dtype: int64
Although these null values will not affect the task of sentiment analysis, to keep things simple, I will drop the rows containing null values as the dataset is already large:
data = data.dropna()
Now let’s have a look at the descriptive statistics of this dataset:
id user_followers ... retweets favorites count 4.749000e+03 4.749000e+03 ... 4749.000000 4749.000000 mean 1.355333e+18 5.069683e+04 ... 1.545378 9.385555 std 1.280104e+16 3.545440e+05 ... 13.395572 55.280915 min 1.337728e+18 0.000000e+00 ... 0.000000 0.000000 25% 1.344929e+18 1.740000e+02 ... 0.000000 0.000000 50% 1.352030e+18 6.480000e+02 ... 0.000000 1.000000 75% 1.364940e+18 2.728000e+03 ... 1.000000 5.000000 max 1.384788e+18 1.371493e+07 ... 678.000000 1979.000000 [8 rows x 6 columns]
The text column is the most important feature in this dataset as it contains the opinions of the users of Twitter about the Pfizer vaccine. But the text column needs to be prepared as it contains many special symbols and language errors. Below is how we can clean the text column:
Now, let’s have a look at the word cloud of the text column. A word cloud is a data visualisation technique that shows the most used words in large font and the less used words in small font. Here is how you can visualize the word cloud of the text column:
Now let’s have a look at the word cloud of the hashtags column, which can show what kind of tags was trending when people were sharing their opinions about the Pfizer vaccine:
The “user_verified” column in the dataset shows whether the users who have shared their opinions are verified by Twitter or not. A verified user on Twitter is a public figure or a celebrity. So let’s have a look at how many users were verified who shared their opinions about the Pfizer vaccine:
False 4169 True 580 Name: user_verified, dtype: int64
In the above output, False shows the count of unverified users and True shows the count of verified users. Now let’s move to the task of sentiment analysis of the Pfizer vaccine. Here I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the text column:
text ... Neutral 0 folk said daikon past could treat cytokin stor... ... 0.748 2 coronavirus sputnikv astrazeneca pfizerbiontec... ... 1.000 6 bit sad claim fame success vaccin patriot comp... ... 0.481 9 covidvaccin state start get monday us say pak... ... 1.000 10 death close mark million peopl wait pfizerbio... ... 0.698 [5 rows x 4 columns]
Now let’s calculate how most of the people felt about the Pfizer vaccine:
So most of the opinions of the users were Neutral, let’s have a look at the total of each sentiment score before making any conclusion:
print("Positive: ", x) print("Negative: ", y) print("Neutral: ", z)
Positive: 417.81600000000003 Negative: 188.81200000000024 Neutral: 4142.3750000000055
The total of positive and negative is very less than Neutral, so we can say that the discussion of the Twitter users was about the awareness of the Pfizer vaccine rather than sharing its benefits or drawbacks.
So this is how you can analyze the sentiments of Twitter users about the Pfizer vaccine. In the conclusion of this sentiment analysis, I can just say that the discussion of the Twitter users was about the awareness of the Pfizer vaccine rather than sharing its benefits or drawbacks. I hope you liked this article on Pfizer vaccine sentiment analysis using Python. Feel free to ask your valuable questions in the comments section below.