Ukraine Russia War Twitter Sentiment Analysis using Python

Today is the 19th day of war between Russia and Ukraine. Many countries are supporting Ukraine by introducing economic sanctions on Russia. There are a lot of tweets about the Ukraine and Russia war where people tend to update about the ground truths, what they feel about it, and who they are supporting. So if you want to analyze the sentiments of people over the Ukraine and Russian War, this article is for you. In this article, I will take you through the task of Ukraine and Russia war Twitter Sentiment Analysis using Python.

Ukraine Russia War Twitter Sentiment Analysis using Python

The dataset that I am using for the task of Twitter sentiment analysis on the Ukraine and Russia War is downloaded from Kaggle. This dataset was initially collected from Twitter and is updated regularly. You can download this dataset from here. Now let’s import the necessary Python libraries and the dataset to get started with this task:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator
import nltk
import re
from nltk.corpus import stopwords
import string

data = pd.read_csv("filename.csv")
print(data.head())
             id  conversation_id               created_at       date     time  \
0  1.502530e+18     1.502260e+18  2022-03-12 06:03:14 UTC  3/12/2022  6:03:14   
1  1.502530e+18     1.502530e+18  2022-03-12 06:03:14 UTC  3/12/2022  6:03:14   
2  1.502530e+18     1.502530e+18  2022-03-12 06:03:13 UTC  3/12/2022  6:03:13   
3  1.502530e+18     1.502210e+18  2022-03-12 06:03:12 UTC  3/12/2022  6:03:12   
4  1.502530e+18     1.500440e+18  2022-03-12 06:03:12 UTC  3/12/2022  6:03:12   

   timezone       user_id         username  \
0         0  2.019880e+07         redcelia   
1         0  2.275356e+08          eee_eff   
2         0  8.431317e+07      mistify_007   
3         0  9.898620e+17  reallivinghuman   
4         0  1.164940e+18           rpcsas   

                                      name place  ... geo source user_rt_id  \
0    Johnson Out🇺🇦 🇪🇺🇮🇹🇦🇫💙😷 #NeverVoteTory   NaN  ... NaN    NaN        NaN   
1  Wearing Masks still saves lives 🇺🇦🇲🇨🏥🌹🌹   NaN  ... NaN    NaN        NaN   
2                                Brian🤸‍♀️   NaN  ... NaN    NaN        NaN   
3                                    Basha   NaN  ... NaN    NaN        NaN   
4                                   RonJon   NaN  ... NaN    NaN        NaN   

  user_rt retweet_id                                           reply_to  \
0     NaN        NaN  [{'screen_name': 'RussianEmbassy', 'name': 'Ru...   
1     NaN        NaN                                                 []   
2     NaN        NaN                                                 []   
3     NaN        NaN  [{'screen_name': 'RussianEmbassy', 'name': 'Ru...   
4     NaN        NaN  [{'screen_name': 'IsraeliPM', 'name': 'Prime M...   

   retweet_date  translate trans_src trans_dest  
0           NaN        NaN       NaN        NaN  
1           NaN        NaN       NaN        NaN  
2           NaN        NaN       NaN        NaN  
3           NaN        NaN       NaN        NaN  
4           NaN        NaN       NaN        NaN  

[5 rows x 36 columns]

Let’s have a quick look at all the column names of the dataset:

print(data.columns)
Index(['id', 'conversation_id', 'created_at', 'date', 'time', 'timezone',
       'user_id', 'username', 'name', 'place', 'tweet', 'language', 'mentions',
       'urls', 'photos', 'replies_count', 'retweets_count', 'likes_count',
       'hashtags', 'cashtags', 'link', 'retweet', 'quote_url', 'video',
       'thumbnail', 'near', 'geo', 'source', 'user_rt_id', 'user_rt',
       'retweet_id', 'reply_to', 'retweet_date', 'translate', 'trans_src',
       'trans_dest'],
      dtype='object')

We only need three columns for this task (username, tweet, and language); I will only select these columns and move forward:

data = data[["username", "tweet", "language"]]

Let’s have a look at whether any of these columns contains any null values or not:

data.isnull().sum()
username    0
tweet       0
language    0
dtype: int64

So none of the columns has null values, let’s have a quick look at how many tweets are posted in which language:

data["language"].value_counts()
en     8812
pt      251
und     198
it      155
in      122
ru       85
hi       55
ja       52
es       40
ta       23
tr       19
ca       18
fr       16
et       16
tl       15
nl       14
de       13
pl       13
fi        9
ar        9
zh        9
sv        6
uk        6
te        6
mr        5
cs        4
el        4
gu        4
no        3
th        3
kn        3
ro        3
ur        2
or        2
eu        2
ko        2
ht        2
sl        2
bn        1
cy        1
ne        1
Name: language, dtype: int64

So most of the tweets are in English. Let’s prepare this data for the task of sentiment analysis. Here I will remove all the links, punctuation, symbols and other language errors from the tweets:

nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text
data["tweet"] = data["tweet"].apply(clean)

Now let’s have a look at the wordcloud of the tweets, which will show the most frequently used words in the tweets by people sharing their feelings and updates about the Ukraine and Russia war:

text = " ".join(i for i in data.tweet)
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(text)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Ukraine Russia War Twitter Sentiment Analysis

Now I will add three more columns in this dataset as Positive, Negative, and Neutral by calculating the sentiment scores of the tweets:

nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
data["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in data["tweet"]]
data["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in data["tweet"]]
data["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in data["tweet"]]
data = data[["tweet", "Positive", "Negative", "Neutral"]]
print(data.head())
                                               tweet  Positive  Negative  \
0  russianembassi ft mfarussia jeffdsach csdcolum...     0.077     0.284   
1  kidnap without charg access lawyer putin russi...     0.000     0.000   
2  much western civil everyon feel compel find cr...     0.144     0.259   
3  russianembassi love place ill visit sure next ...     0.291     0.126   
4  israelipm iaeaorg didnt know state israel advi...     0.000     0.000   

   Neutral  
0    0.639  
1    1.000  
2    0.596  
3    0.583  
4    1.000 

Now let’s have a look at the most frequent words used by people with positive sentiments:

positive =' '.join([i for i in data['tweet'][data['Positive'] > data["Negative"]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(positive)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
positive sentiments on twitter

Now let’s have a look at the most frequent words used by people with negative sentiments:

negative =' '.join([i for i in data['tweet'][data['Negative'] > data["Positive"]]])
stopwords = set(STOPWORDS)
wordcloud = WordCloud(stopwords=stopwords, background_color="white").generate(negative)
plt.figure( figsize=(15,10))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.show()
Ukraine Russia War Twitter Sentiment Analysis

So this is how you can analyze the sentiments of people over the Ukraine and Russia war. I hope this war gets over soon and things get back to normal.

Summary

There are a lot of tweets about the Ukraine and Russia war where people tend to update about the ground truths, what they feel about it, and who they are supporting. I used those tweets for the task of Twitter sentiment analysis on the Ukraine and Russia war. I hope you liked this article. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1370

4 Comments

  1. This is amazing. Please, I’m a beginner learning Python.

    How do you get to remember all these modules you bring in, same as the symbols and all? If someone checks book to use them while on a particular project, can one call himself data analyst or one needs to know everything off hand?

    If it’s the later case, how do one get to know it off hand?

    • Practice! Work on different types of problems. It’s okay to take help from anywhere you want. People working at FAANG also take help from Google, research papers, books, and publications.

  2. Great post, what if I want to extract my own data without using data fr kaggle. Like I want to work on a different sentiment analysis and not on Ukraine and Russia.

    What do you suggest would be the best way to extract it.

Leave a Reply