Text Emotions Classification using Python

Text emotions classification is the problem of assigning emotion to a text by understanding the context and the emotion behind the text. One real-world example is the keyboard of an iPhone that recommends the most relevant emoji by understanding the text. So, if you want to learn how to classify the emotions of a text, this article is for you. In this article, I will take you through the task of text emotions classification with Machine Learning using Python.

Text Emotions Classification

Text emotions classification is the problem of natural language processing and text classification. Here we need to train a text classification model to classify the emotion of a text.

To solve this problem, we need labelled data of texts and their emotions. I found an ideal dataset to solve this problem on Kaggle. You can download the dataset from here.

In the section below, I’ll take you through how to train a text classification model for the task of Text Emotions Classification using Machine Learning and the Python programming language.

Text Emotions Classification using Python

I’ll start by importing the necessary Python libraries and the dataset:

import pandas as pd
import numpy as np
import keras
import tensorflow
from keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Embedding, Flatten, Dense


data = pd.read_csv("train.txt", sep=';')
data.columns = ["Text", "Emotions"]
print(data.head())
                                                Text Emotions
0  i can go from feeling so hopeless to so damned...  sadness
1   im grabbing a minute to post i feel greedy wrong    anger
2  i am ever feeling nostalgic about the fireplac...     love
3                               i am feeling grouchy    anger
4  ive been feeling a little burdened lately wasn...  sadness

As this is a problem of natural language processing, I’ll start by tokenizing the data:

texts = data["Text"].tolist()
labels = data["Emotions"].tolist()

# Tokenize the text data
tokenizer = Tokenizer()
tokenizer.fit_on_texts(texts)

Now we need to pad the sequences to the same length to feed them into a neural network. Here’s how we can pad the sequences of the texts to have the same length:

sequences = tokenizer.texts_to_sequences(texts)
max_length = max([len(seq) for seq in sequences])
padded_sequences = pad_sequences(sequences, maxlen=max_length)

Now I’ll use the label encoder method to convert the classes from strings to a numerical representation:

# Encode the string labels to integers
label_encoder = LabelEncoder()
labels = label_encoder.fit_transform(labels)

We are now going to One-hot encode the labels. One hot encoding refers to the transformation of categorical labels into a binary representation where each label is represented as a vector of all zeros except a single 1. This is necessary because machine learning algorithms work with numerical data. So here is how we can One-hot encode the labels:

# One-hot encode the labels
one_hot_labels = keras.utils.to_categorical(labels)

Text Emotions Classification Model

Now we will split the data into training and test sets:

# Split the data into training and testing sets
xtrain, xtest, ytrain, ytest = train_test_split(padded_sequences, 
                                                one_hot_labels, 
                                                test_size=0.2)

Now let’s define a neural network architecture for our classification problem and use it to train a model to classify emotions:

# Define the model
model = Sequential()
model.add(Embedding(input_dim=len(tokenizer.word_index) + 1, 
                    output_dim=128, input_length=max_length))
model.add(Flatten())
model.add(Dense(units=128, activation="relu"))
model.add(Dense(units=len(one_hot_labels[0]), activation="softmax"))

model.compile(optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"])
model.fit(xtrain, ytrain, epochs=10, batch_size=32, validation_data=(xtest, ytest))
Epoch 1/10
400/400 [==============================] - 12s 28ms/step - loss: 1.3766 - accuracy: 0.4693 - val_loss: 0.8994 - val_accuracy: 0.7028
Epoch 2/10
400/400 [==============================] - 11s 28ms/step - loss: 0.3783 - accuracy: 0.8862 - val_loss: 0.5440 - val_accuracy: 0.8338
Epoch 3/10
400/400 [==============================] - 11s 28ms/step - loss: 0.0681 - accuracy: 0.9831 - val_loss: 0.5799 - val_accuracy: 0.8281
Epoch 4/10
400/400 [==============================] - 11s 27ms/step - loss: 0.0278 - accuracy: 0.9941 - val_loss: 0.6063 - val_accuracy: 0.8272
Epoch 5/10
400/400 [==============================] - 11s 28ms/step - loss: 0.0173 - accuracy: 0.9962 - val_loss: 0.6683 - val_accuracy: 0.8281
Epoch 6/10
400/400 [==============================] - 11s 28ms/step - loss: 0.0164 - accuracy: 0.9968 - val_loss: 0.7021 - val_accuracy: 0.8250
Epoch 7/10
400/400 [==============================] - 13s 31ms/step - loss: 0.0135 - accuracy: 0.9972 - val_loss: 0.7059 - val_accuracy: 0.8238
Epoch 8/10
400/400 [==============================] - 12s 31ms/step - loss: 0.0127 - accuracy: 0.9977 - val_loss: 0.7705 - val_accuracy: 0.8163
Epoch 9/10
400/400 [==============================] - 11s 28ms/step - loss: 0.0127 - accuracy: 0.9971 - val_loss: 0.7710 - val_accuracy: 0.8181
Epoch 10/10
400/400 [==============================] - 11s 28ms/step - loss: 0.0110 - accuracy: 0.9975 - val_loss: 0.8234 - val_accuracy: 0.8206
<keras.callbacks.History at 0x7fa6a85354f0>

Now let’s take a sentence as an input text and see how the model performs:

input_text = "She didn't come today because she lost her dog yestertay!"

# Preprocess the input text
input_sequence = tokenizer.texts_to_sequences([input_text])
padded_input_sequence = pad_sequences(input_sequence, maxlen=max_length)
prediction = model.predict(padded_input_sequence)
predicted_label = label_encoder.inverse_transform([np.argmax(prediction[0])])
print(predicted_label)
1/1 [==============================] - 0s 145ms/step
['sadness']

So this is how you can use Machine Learning for the task of text emotion classification using the Python programming language.

Summary

Text emotion classification is the problem of assigning emotion to a text by understanding the context and the emotion behind the text. One real-world example is the keyboard of an iPhone that recommends the most relevant emoji by understanding the text. I hope you liked this article on Text Emotion Classification with Machine Learning using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

Leave a Reply