Emotion Detection Model

Emotion detection involves recognizing a person’s emotional state – for example, anger, confusion, or deception on vocal and non-vocal channels. The most common technique analyzes the characteristics of the speech signal, with the use of words as additional input, if available. In this article, I’ll walk you through how to build an emotion detection model with machine learning.

Emotion Detection Model with Machine Learning

The dataset I will use for this task can be easily downloaded from here. Now let’s start with importing the necessary libraries. I will use the PyTorch package to train our Emotion Detection Model. I will use the OS module to call the datasets from there respective folders:

import os import torch import torchvision import torch.nn as nn import numpy as np import torch.nn.functional as F from torchvision.datasets import ImageFolder from torch.utils.data import DataLoader import torchvision.transforms as tt from torchvision.utils import make_grid import matplotlib.pyplot as plt %matplotlib inline data_dir = './dataset' print(os.listdir(data_dir)) classes_train = os.listdir(data_dir + "/train") classes_valid = os.listdir(data_dir + "/validation")

Now, let me explain the process, first of all, we use grayscale images since most images are already grayscale but just to be sure there are no exceptions. Second, configure a function that will have a 50% probability of flipping an image horizontally. Third, rotate each image approximately 30 degrees in random directions (left or right).

Finally, convert PIL images to tensor since Pytorch layers only work with tensors. This made my training data quite random which prevented my model from over-fitting.

Preparing the data:

# Data transforms (Gray Scaling & data augmentation) train_tfms = tt.Compose([tt.Grayscale(num_output_channels=1), tt.RandomHorizontalFlip(), tt.RandomRotation(30), tt.ToTensor()]) valid_tfms = tt.Compose([tt.Grayscale(num_output_channels=1), tt.ToTensor()]) # Emotion Detection datasets train_ds = ImageFolder(data_dir + '/train', train_tfms) valid_ds = ImageFolder(data_dir + '/validation', valid_tfms) batch_size = 200 # PyTorch data loaders train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=3, pin_memory=True) valid_dl = DataLoader(valid_ds, batch_size*2, num_workers=3, pin_memory=True)

Before moving forward, let’s have a quick look at the data:

def show_batch(dl): for images, labels in dl: fig, ax = plt.subplots(figsize=(12, 12)) ax.set_xticks([]); ax.set_yticks([]) print(images[0].shape) ax.imshow(make_grid(images[:64], nrow=8).permute(1, 2, 0)) break show_batch(train_dl)
emotion detection

Transferring the emotion detection model to GPU memory

In PyTorch, to use the GPU to train your model, you must first move your data and model to GPU memory. By default, the data and model are loaded into the CPU memory. Now for this task, I will create 2 functions and 1 class:

def get_default_device(): if torch.cuda.is_available(): return torch.device('cuda') else: return torch.device('cpu') def to_device(data, device): if isinstance(data, (list,tuple)): return [to_device(x, device) for x in data] return data.to(device, non_blocking=True) class DeviceDataLoader(): def __init__(self, dl, device): self.dl = dl self.device = device def __iter__(self): for b in self.dl: yield to_device(b, self.device) def __len__(self): return len(self.dl)

If you want to check if your environment has access to a “Cuda” enabled GPU, just run the get_default_device () function and the output should be like below:

device = get_default_device() device

All that remains is to move the data to GPU memory using the DeviceDataLoader class:

train_dl = DeviceDataLoader(train_dl, device) valid_dl = DeviceDataLoader(valid_dl, device)

Building Model

Since then, Pytorch doesn’t have any handy loss calculation, gradient derivation, or optimizer setup functionality that I know of. Therefore, I had to manually create these steps in terms of a class that inherits from the nn.Module class from Pytorch to build the emotion detection model:

def conv_block(in_channels, out_channels, pool=False): layers = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ELU(inplace=True)] if pool: layers.append(nn.MaxPool2d(2)) return nn.Sequential(*layers) class ResNet(ImageClassificationBase): def __init__(self, in_channels, num_classes): super().__init__() self.conv1 = conv_block(in_channels, 128) self.conv2 = conv_block(128, 128, pool=True) self.res1 = nn.Sequential(conv_block(128, 128), conv_block(128, 128)) self.drop1 = nn.Dropout(0.5) self.conv3 = conv_block(128, 256) self.conv4 = conv_block(256, 256, pool=True) self.res2 = nn.Sequential(conv_block(256, 256), conv_block(256, 256)) self.drop2 = nn.Dropout(0.5) self.conv5 = conv_block(256, 512) self.conv6 = conv_block(512, 512, pool=True) self.res3 = nn.Sequential(conv_block(512, 512), conv_block(512, 512)) self.drop3 = nn.Dropout(0.5) self.classifier = nn.Sequential(nn.MaxPool2d(6), nn.Flatten(), nn.Linear(512, num_classes)) def forward(self, xb): out = self.conv1(xb) out = self.conv2(out) out = self.res1(out) + out out = self.drop1(out) out = self.conv3(out) out = self.conv4(out) out = self.res2(out) + out out = self.drop2(out) out = self.conv5(out) out = self.conv6(out) out = self.res3(out) + out out = self.drop3(out) out = self.classifier(out) return out

Training the Emotion Detection Model:

@torch.no_grad() def evaluate(model, val_loader): model.eval() outputs = [model.validation_step(batch) for batch in val_loader] return model.validation_epoch_end(outputs) def get_lr(optimizer): for param_group in optimizer.param_groups: return param_group['lr'] def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader, weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD): torch.cuda.empty_cache() history = [] # Set up custom optimizer with weight decay optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay) # Set up one-cycle learning rate scheduler sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs, steps_per_epoch=len(train_loader)) for epoch in range(epochs): # Training Phase model.train() train_losses = [] lrs = [] for batch in train_loader: loss = model.training_step(batch) train_losses.append(loss) loss.backward() # Gradient clipping if grad_clip: nn.utils.clip_grad_value_(model.parameters(), grad_clip) optimizer.step() optimizer.zero_grad() # Record & update learning rate lrs.append(get_lr(optimizer)) sched.step() # Validation phase result = evaluate(model, val_loader) result['train_loss'] = torch.stack(train_losses).mean().item() result['lrs'] = lrs model.epoch_end(epoch, result) history.append(result) return history

Evaluating the Model

I have tried a bunch of different hyperparameters and the above gave me the best result in the shortest time. I used 24 eras to train my model. A maximum learning rate of 0.008, which means that during learning rate planning, the highest learning rate can reach 0.008.

@torch.no_grad() def evaluate(model, val_loader): model.eval() outputs = [model.validation_step(batch) for batch in val_loader] return model.validation_epoch_end(outputs) def get_lr(optimizer): for param_group in optimizer.param_groups: return param_group['lr'] def fit_one_cycle(epochs, max_lr, model, train_loader, val_loader, weight_decay=0, grad_clip=None, opt_func=torch.optim.SGD): torch.cuda.empty_cache() history = [] # Set up custom optimizer with weight decay optimizer = opt_func(model.parameters(), max_lr, weight_decay=weight_decay) # Set up one-cycle learning rate scheduler sched = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, epochs=epochs, steps_per_epoch=len(train_loader)) for epoch in range(epochs): # Training Phase model.train() train_losses = [] lrs = [] for batch in train_loader: loss = model.training_step(batch) train_losses.append(loss) loss.backward() # Gradient clipping if grad_clip: nn.utils.clip_grad_value_(model.parameters(), grad_clip) optimizer.step() optimizer.zero_grad() # Record & update learning rate lrs.append(get_lr(optimizer)) sched.step() # Validation phase result = evaluate(model, val_loader) result['train_loss'] = torch.stack(train_losses).mean().item() result['lrs'] = lrs model.epoch_end(epoch, result) history.append(result) return history epochs = 24 max_lr = 0.008 grad_clip = 0.1 weight_decay = 1e-4 opt_func = torch.optim.Adam %%time history += fit_one_cycle(epochs, max_lr, model, train_dl, valid_dl, grad_clip=grad_clip, weight_decay=weight_decay, opt_func=opt_func)

Plotting Performance Graphs

This graph below shows how the accuracy fluctuated when the learning rate was high, and then gradually increased as the learning rate got lower and lower:

def plot_accuracies(history): accuracies = [x['val_acc'] for x in history] plt.plot(accuracies, '-x') plt.xlabel('epoch') plt.ylabel('accuracy') plt.title('Accuracy vs. No. of epochs') plot_accuracies(history)

Also, Read – Text to Speech using Python.

The model looks really good to use for production. I hope you like this article on Emotion Detection with Machine Learning. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Also, Read – Fashion Recommendation System with Machine Learning.

Follow Us:

Leave a Reply