Omicron Sentiment Analysis using Python

A few days ago, the WHO designated a new variant of the coronavirus, B.1.1.529, as a variant of concern which has been named Omicron. Right after that, we saw an outbreak of tweets about the Omicron variant on Twitter. So, if you want to know how we can analyze the sentiments of the tweets about the Omicron variant, then this article is for you. In this article, I will walk you through the task of Omicron Sentiment Analysis using Python.

Omicron Sentiment Analysis using Python

The dataset that I am using for the task of Omicron sentiment analysis is downloaded from Kaggle, which was initially collected from Twitter when people were sharing their opinions about the Omicron variant. So let’s start the task of Omicron sentiment analysis by importing the necessary Python libraries and the dataset:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

data = pd.read_csv("omicron.csv")
print(data.head())
                    id             user_name  ... favorites is_retweet
0  1465693385088323591                Abaris  ...         0      False
1  1465693062999412746                GFTs   ...         0      False
2  1465690116442279942  Herbie Finkle (Cozy)  ...         1      False
3  1465689607165591552     Electrical Review  ...         0      False
4  1465688203709464578       BingX Academy 🔑  ...         2      False

[5 rows x 16 columns]

This dataset is quite large, let’s have a look at whether this dataset contains any null values or not:

print(data.isnull().sum())
id                     0
user_name              0
user_location       4438
user_description    1278
user_created           0
user_followers         0
user_friends           0
user_favourites        0
user_verified          0
date                   0
text                   0
hashtags            4374
source                 0
retweets               0
favorites              0
is_retweet             0
dtype: int64

The dataset contains null values in three columns that contains textual data, I will remove all the rows containing the null values:

data = data.dropna()

Sentiment Analysis of Omicron Variant

The text column in the dataset contains the tweets done by people to share their opinions about the Omicron variant. To move further, we need to clean and prepare this column for the task of sentiment analysis. Here’s how we can do that:

import nltk
import re
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["text"] = data["text"].apply(clean)

As we have cleaned the text column, now let’s have a look at the word cloud of the text column to look at the most number of words used by the people on their tweets:

text = " ".join(i for i in data.text)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
omicron variant tweets

Now let’s have a look at the word cloud of the hashtags column to look at the most number of hashtags used by the people on their tweets:

text = " ".join(i for i in data.hashtags)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
omicron variant hashtags

Now I will calculate the sentiment scores of the tweets about the Omicron variant. Here I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the text column:

nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["text"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["text"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["text"]]
data = data[["text", "Positive", "Negative", "Neutral"]]
print(data.head())
                                                text  Positive  Negative  Neutral
0  skynew told id back omicron "odium medicum ins...      0.16     0.000    0.840
1                         someon told octob omicron       0.00     0.000    1.000
3  autom system becom increas complex effort test...      0.00     0.000    1.000
5  digitaldisrupt emerg technolog stay privat inv...      0.00     0.000    1.000
7  fatigu head bodi ach occasion sore throat coug...      0.00     0.172    0.828

Now let’s see how most of the people reacted about the Omicron variant:

x = sum(data["Positive"])
y = sum(data["Negative"])
z = sum(data["Neutral"])

def sentiment_score(a, b, c):
    if (a>b) and (a>c):
        print("Positive 😊 ")
    elif (b>a) and (b>c):
        print("Negative 😠 ")
    else:
        print("Neutral 🙂 ")
sentiment_score(x, y, z)
Neutral 🙂

So most of the opinions were Neutral, which means that people were sharing information about the Omicron variant instead of sharing any positive or negative opinions.

Summary

So this is how you can analyze the sentiments of the Omicron variant of coronavirus. It’s a new variant of coronavirus that has been designated as the variant of concern by the World Health Organization. I hope you liked this article on Omicron sentiment analysis using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1501

Leave a Reply