Ukraine Russia War Twitter Sentiment Analysis using Python

Today is the 19th day of war between Russia and Ukraine. Many countries are supporting Ukraine by introducing economic sanctions on Russia. There are a lot of tweets about the Ukraine and Russia war where people tend to update about the ground truths, what they feel about it, and who they are supporting. So if you want to analyze the sentiments of people over the Ukraine and Russian War, this article is for you. In this article, I will take you through the task of Ukraine and Russia war Twitter Sentiment Analysis using Python.

Ukraine Russia War Twitter Sentiment Analysis using Python

The dataset that I am using for the task of Twitter sentiment analysis on the Ukraine and Russia War is downloaded from Kaggle. This dataset was initially collected from Twitter and is updated regularly. You can download this dataset from here. Now let’s import the necessary Python libraries and the dataset to get started with this task:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import nltk
import re
from nltk.corpus import stopwords
import string

data = pd.read_csv("filename.csv")
print(data.head())
             id  conversation_id               created_at       date     time  \
0  1.502530e+18     1.502260e+18  2022-03-12 06:03:14 UTC  3/12/2022  6:03:14   
1  1.502530e+18     1.502530e+18  2022-03-12 06:03:14 UTC  3/12/2022  6:03:14   
2  1.502530e+18     1.502530e+18  2022-03-12 06:03:13 UTC  3/12/2022  6:03:13   
3  1.502530e+18     1.502210e+18  2022-03-12 06:03:12 UTC  3/12/2022  6:03:12   
4  1.502530e+18     1.500440e+18  2022-03-12 06:03:12 UTC  3/12/2022  6:03:12   

   timezone       user_id         username  \
0         0  2.019880e+07         redcelia   
1         0  2.275356e+08          eee_eff   
2         0  8.431317e+07      mistify_007   
3         0  9.898620e+17  reallivinghuman   
4         0  1.164940e+18           rpcsas   

                                      name place  ... geo source user_rt_id  \
0    Johnson Out🇺🇦 🇪🇺🇮🇹🇦🇫💙😷 #NeverVoteTory   NaN  ... NaN    NaN        NaN   
1  Wearing Masks still saves lives 🇺🇦🇲🇨🏥🌹🌹   NaN  ... NaN    NaN        NaN   
2                                Brian🤸‍♀️   NaN  ... NaN    NaN        NaN   
3                                    Basha   NaN  ... NaN    NaN        NaN   
4                                   RonJon   NaN  ... NaN    NaN        NaN   

  user_rt retweet_id                                           reply_to  \
0     NaN        NaN  [{'screen_name': 'RussianEmbassy', 'name': 'Ru...   
1     NaN        NaN                                                 []   
2     NaN        NaN                                                 []   
3     NaN        NaN  [{'screen_name': 'RussianEmbassy', 'name': 'Ru...   
4     NaN        NaN  [{'screen_name': 'IsraeliPM', 'name': 'Prime M...   

   retweet_date  translate trans_src trans_dest  
0           NaN        NaN       NaN        NaN  
1           NaN        NaN       NaN        NaN  
2           NaN        NaN       NaN        NaN  
3           NaN        NaN       NaN        NaN  
4           NaN        NaN       NaN        NaN  

[5 rows x 36 columns]

Let’s have a quick look at all the column names of the dataset:

print(data.columns)
Index(['id', 'conversation_id', 'created_at', 'date', 'time', 'timezone',
       'user_id', 'username', 'name', 'place', 'tweet', 'language', 'mentions',
       'urls', 'photos', 'replies_count', 'retweets_count', 'likes_count',
       'hashtags', 'cashtags', 'link', 'retweet', 'quote_url', 'video',
       'thumbnail', 'near', 'geo', 'source', 'user_rt_id', 'user_rt',
       'retweet_id', 'reply_to', 'retweet_date', 'translate', 'trans_src',
       'trans_dest'],
      dtype='object')

We only need three columns for this task (username, tweet, and language); I will only select these columns and move forward:

data = data[["username", "tweet", "language"]]

Let’s have a look at whether any of these columns contains any null values or not:

data.isnull().sum()
username    0
tweet       0
language    0
dtype: int64

So none of the columns has null values, let’s have a quick look at how many tweets are posted in which language:

data["language"].value_counts()
en     8812
pt      251
und     198
it      155
in      122
ru       85
hi       55
ja       52
es       40
ta       23
tr       19
ca       18
fr       16
et       16
tl       15
nl       14
de       13
pl       13
fi        9
ar        9
zh        9
sv        6
uk        6
te        6
mr        5
cs        4
el        4
gu        4
no        3
th        3
kn        3
ro        3
ur        2
or        2
eu        2
ko        2
ht        2
sl        2
bn        1
cy        1
ne        1
Name: language, dtype: int64

So most of the tweets are in English. Let’s prepare this data for the task of sentiment analysis. Here I will remove all the links, punctuation, symbols and other language errors from the tweets:

nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["tweet"] = data["tweet"].apply(clean)

Now let’s have a look at the wordcloud of the tweets, which will show the most frequently used words in the tweets by people sharing their feelings and updates about the Ukraine and Russia war:

text = " ".join(i for i in data.tweet)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Ukraine Russia War Twitter Sentiment Analysis

Now I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the tweets:

nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["tweet"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["tweet"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["tweet"]]
data = data[["tweet", "Positive", "Negative", "Neutral"]]
print(data.head())
                                               tweet  Positive  Negative  \
0  russianembassi ft mfarussia jeffdsach csdcolum...     0.077     0.284   
1  kidnap without charg access lawyer putin russi...     0.000     0.000   
2  much western civil everyon feel compel find cr...     0.144     0.259   
3  russianembassi love place ill visit sure next ...     0.291     0.126   
4  israelipm iaeaorg didnt know state israel advi...     0.000     0.000   

   Neutral  
0    0.639  
1    1.000  
2    0.596  
3    0.583  
4    1.000 

Now let’s have a look at the most frequent words used by people with positive sentiments:

positive =' '.join([i for i in data['tweet'][data['Positive'] > data["Negative"]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(positive)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
positive sentiments on twitter

Now let’s have a look at the most frequent words used by people with negative sentiments:

negative =' '.join([i for i in data['tweet'][data['Negative'] > data["Positive"]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(negative)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Ukraine Russia War Twitter Sentiment Analysis

So this is how you can analyze the sentiments of people over the Ukraine and Russia war. I hope this war gets over soon and things get back to normal.

Summary

There are a lot of tweets about the Ukraine and Russia war where people tend to update about the ground truths, what they feel about it, and who they are supporting. I used those tweets for the task of Twitter sentiment analysis on the Ukraine and Russia war. I hope you liked this article. Feel free to ask valuable questions in the comments section below.

Default image
Aman Kharwal

Coder with the ♥️ of a Writer || Data Scientist | Solopreneur | Founder

Articles: 1297

Leave a Reply