A few days ago, the WHO designated a new variant of the coronavirus, B.1.1.529, as a variant of concern which has been named Omicron. Right after that, we saw an outbreak of tweets about the Omicron variant on Twitter. So, if you want to know how we can analyze the sentiments of the tweets about the Omicron variant, then this article is for you. In this article, I will walk you through the task of Omicron Sentiment Analysis using Python.
Omicron Sentiment Analysis using Python
The dataset that I am using for the task of Omicron sentiment analysis is downloaded from Kaggle, which was initially collected from Twitter when people were sharing their opinions about the Omicron variant. So let’s start the task of Omicron sentiment analysis by importing the necessary Python libraries and the dataset:
import pandas as pd import seaborn as sns import matplotlib.pyplot as plt from nltk.sentiment.vader import SentimentIntensityAnalyzer from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator data = pd.read_csv("omicron.csv") print(data.head())
id user_name ... favorites is_retweet 0 1465693385088323591 Abaris ... 0 False 1 1465693062999412746 GFTs  ... 0 False 2 1465690116442279942 Herbie Finkle (Cozy) ... 1 False 3 1465689607165591552 Electrical Review ... 0 False 4 1465688203709464578 BingX Academy 🔑 ... 2 False [5 rows x 16 columns]
This dataset is quite large, let’s have a look at whether this dataset contains any null values or not:
print(data.isnull().sum())
id 0 user_name 0 user_location 4438 user_description 1278 user_created 0 user_followers 0 user_friends 0 user_favourites 0 user_verified 0 date 0 text 0 hashtags 4374 source 0 retweets 0 favorites 0 is_retweet 0 dtype: int64
The dataset contains null values in three columns that contains textual data, I will remove all the rows containing the null values:
data = data.dropna()
Sentiment Analysis of Omicron Variant
The text column in the dataset contains the tweets done by people to share their opinions about the Omicron variant. To move further, we need to clean and prepare this column for the task of sentiment analysis. Here’s how we can do that:
import nltk import re nltk.download('stopwords') stemmer = nltk.SnowballStemmer("english") from nltk.corpus import stopwords import string stopword=set(stopwords.words('english')) def clean(text): text = str(text).lower() text = re.sub('\[.*?\]', '', text) text = re.sub('https?://\S+|www\.\S+', '', text) text = re.sub('<.*?>+', '', text) text = re.sub('[%s]' % re.escape(string.punctuation), '', text) text = re.sub('\n', '', text) text = re.sub('\w*\d\w*', '', text) text = [word for word in text.split(' ') if word not in stopword] text=" ".join(text) text = [stemmer.stem(word) for word in text.split(' ')] text=" ".join(text) return text data["text"] = data["text"].apply(clean)
As we have cleaned the text column, now let’s have a look at the word cloud of the text column to look at the most number of words used by the people on their tweets:
text = " ".join(i for i in data.text) stopwords = set(STOPWORDS) wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text) plt.figure( figsize=(15,10)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.show()

Now let’s have a look at the word cloud of the hashtags column to look at the most number of hashtags used by the people on their tweets:
text = " ".join(i for i in data.hashtags) stopwords = set(STOPWORDS) wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text) plt.figure( figsize=(15,10)) plt.imshow(wordcloud, interpolation='bilinear') plt.axis("off") plt.show()

Now I will calculate the sentiment scores of the tweets about the Omicron variant. Here I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the text column:
nltk.download('vader_lexicon') sentiments = SentimentIntensityAnalyzer() data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["text"]] data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["text"]] data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["text"]] data = data[["text", "Positive", "Negative", "Neutral"]] print(data.head())
text Positive Negative Neutral 0 skynew told id back omicron "odium medicum ins... 0.16 0.000 0.840 1 someon told octob omicron 0.00 0.000 1.000 3 autom system becom increas complex effort test... 0.00 0.000 1.000 5 digitaldisrupt emerg technolog stay privat inv... 0.00 0.000 1.000 7 fatigu head bodi ach occasion sore throat coug... 0.00 0.172 0.828
Now let’s see how most of the people reacted about the Omicron variant:
x = sum(data["Positive"]) y = sum(data["Negative"]) z = sum(data["Neutral"]) def sentiment_score(a, b, c): if (a>b) and (a>c): print("Positive 😊 ") elif (b>a) and (b>c): print("Negative 😠") else: print("Neutral 🙂 ") sentiment_score(x, y, z)
Neutral 🙂
So most of the opinions were Neutral, which means that people were sharing information about the Omicron variant instead of sharing any positive or negative opinions.
Summary
So this is how you can analyze the sentiments of the Omicron variant of coronavirus. It’s a new variant of coronavirus that has been designated as the variant of concern by the World Health Organization. I hope you liked this article on Omicron sentiment analysis using Python. Feel free to ask your valuable questions in the comments section below.