Face Landmarks Detection

Have you ever thought how Snapchat manage to apply amazing filters according to your face? It have been programmed to detect some marks on your face to project a filter according to those marks. In Machine Learning those marks are known as Face Landmarks. In this article I will guide you how you can detect face Landmarks with Machine Learning.

Now, I will simply start with importing all the libraries we need for this task. I will use PyTorch in this article to face landmarks detection with Deep Learning. Let’s import all the libraries:

import time import cv2 import os import random import numpy as np import matplotlib.pyplot as plt from PIL import Image import imutils import matplotlib.image as mpimg from collections import OrderedDict from skimage import io, transform from math import * import xml.etree.ElementTree as ET import torch import torchvision import torch.nn as nn import torch.optim as optim import torch.nn.functional as F import torchvision.transforms.functional as TF from torchvision import datasets, models, transforms from torch.utils.data import Dataset from torch.utils.data import DataLoader

Download the DLIB Dataset

The dataset I will choose here to detect Face Landmarks in an official DLIB dataset which consists of over 6666 images of different dimensions. The code below will download the dataset and unzip for further exploration:

%%capture if not os.path.exists('/content/ibug_300W_large_face_landmark_dataset'): !wget http://dlib.net/files/data/ibug_300W_large_face_landmark_dataset.tar.gz !tar -xvzf 'ibug_300W_large_face_landmark_dataset.tar.gz' !rm -r 'ibug_300W_large_face_landmark_dataset.tar.gz'

Visualize the Dataset

Now, let’s have a look at what we are working with, to see all the data cleaning and preprocessing opportunities that we need to go through. Here is an example of an image from the dataset we have taken for this task.

file = open('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.pts') points = file.readlines()[3:-1] landmarks = [] for point in points: x,y = point.split(' ') landmarks.append([floor(float(x)), floor(float(y[:-1]))]) landmarks = np.array(landmarks) plt.figure(figsize=(10,10)) plt.imshow(mpimg.imread('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.jpg')) plt.scatter(landmarks[:,0], landmarks[:,1], s = 5, c = 'g') plt.show()

face landmarks

You can see that the face covers very less amount of space in the image. If we will use this image in the neural network it will take the background also. So like we prepare a text data we will prepare this image dataset for further exploration.

Creating Dataset Classes

Now Let’s dig deeper into the classes and labels in the dataset. The labels_ibug_300W_train.xml consists of the input images and landmarks and bounding box to crop the face. I will store all these values in the list so that we could easily access them during the training process.

class Transforms(): def __init__(self): pass def rotate(self, image, landmarks, angle): angle = random.uniform(-angle, +angle) transformation_matrix = torch.tensor([ [+cos(radians(angle)), -sin(radians(angle))], [+sin(radians(angle)), +cos(radians(angle))] ]) image = imutils.rotate(np.array(image), angle) landmarks = landmarks - 0.5 new_landmarks = np.matmul(landmarks, transformation_matrix) new_landmarks = new_landmarks + 0.5 return Image.fromarray(image), new_landmarks def resize(self, image, landmarks, img_size): image = TF.resize(image, img_size) return image, landmarks def color_jitter(self, image, landmarks): color_jitter = transforms.ColorJitter(brightness=0.3, contrast=0.3, saturation=0.3, hue=0.1) image = color_jitter(image) return image, landmarks def crop_face(self, image, landmarks, crops): left = int(crops['left']) top = int(crops['top']) width = int(crops['width']) height = int(crops['height']) image = TF.crop(image, top, left, height, width) img_shape = np.array(image).shape landmarks = torch.tensor(landmarks) - torch.tensor([[left, top]]) landmarks = landmarks / torch.tensor([img_shape[1], img_shape[0]]) return image, landmarks def __call__(self, image, landmarks, crops): image = Image.fromarray(image) image, landmarks = self.crop_face(image, landmarks, crops) image, landmarks = self.resize(image, landmarks, (224, 224)) image, landmarks = self.color_jitter(image, landmarks) image, landmarks = self.rotate(image, landmarks, angle=10) image = TF.to_tensor(image) image = TF.normalize(image, [0.5], [0.5]) return image, landmarks
class FaceLandmarksDataset(Dataset): def __init__(self, transform=None): tree = ET.parse('ibug_300W_large_face_landmark_dataset/labels_ibug_300W_train.xml') root = tree.getroot() self.image_filenames = [] self.landmarks = [] self.crops = [] self.transform = transform self.root_dir = 'ibug_300W_large_face_landmark_dataset' for filename in root[2]: self.image_filenames.append(os.path.join(self.root_dir, filename.attrib['file'])) self.crops.append(filename[0].attrib) landmark = [] for num in range(68): x_coordinate = int(filename[0][num].attrib['x']) y_coordinate = int(filename[0][num].attrib['y']) landmark.append([x_coordinate, y_coordinate]) self.landmarks.append(landmark) self.landmarks = np.array(self.landmarks).astype('float32') assert len(self.image_filenames) == len(self.landmarks) def __len__(self): return len(self.image_filenames) def __getitem__(self, index): image = cv2.imread(self.image_filenames[index], 0) landmarks = self.landmarks[index] if self.transform: image, landmarks = self.transform(image, landmarks, self.crops[index]) landmarks = landmarks - 0.5 return image, landmarks dataset = FaceLandmarksDataset(Transforms())

Visualize Train Transforms:

Now let’s have a quick look at what we have done until now. I will just visualize the dataset by performing the transformation that the above classes will provide to the dataset:

image, landmarks = dataset[0] landmarks = (landmarks + 0.5) * 224 plt.figure(figsize=(10, 10)) plt.imshow(image.numpy().squeeze(), cmap='gray'); plt.scatter(landmarks[:,0], landmarks[:,1], s=8);

Split the Dataset for Training and Prediction of Face Landmarks

Now, to move further, I will split the dataset into a train and a valid dataset:

# split the dataset into validation and test sets len_valid_set = int(0.1*len(dataset)) len_train_set = len(dataset) - len_valid_set print("The length of Train set is {}".format(len_train_set)) print("The length of Valid set is {}".format(len_valid_set)) train_dataset , valid_dataset, = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set]) # shuffle and batch the datasets train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4) valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)

The length of Train set is 6000
The length of Valid set is 666

Testing the shape of input data:

images, landmarks = next(iter(train_loader)) print(images.shape) print(landmarks.shape)

torch.Size([64, 1, 224, 224])
torch.Size([64, 68, 2])

Define the Face Landmarks Detection Model

Now I will use the ResNet18 as our fundamental framework. I will modify the first and last layers so that the layers will fit easily for our purpose:

class Network(nn.Module): def __init__(self,num_classes=136): super().__init__() self.model_name='resnet18' self.model=models.resnet18() self.model.conv1=nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False) self.model.fc=nn.Linear(self.model.fc.in_features, num_classes) def forward(self, x): x=self.model(x) return x

Helper Functions:

import sys def print_overwrite(step, total_step, loss, operation): sys.stdout.write('\r') if operation == 'train': sys.stdout.write("Train Steps: %d/%d Loss: %.4f " % (step, total_step, loss)) else: sys.stdout.write("Valid Steps: %d/%d Loss: %.4f " % (step, total_step, loss)) sys.stdout.flush()

Training the Neural Network for Face Landmarks Detection

I will now use the Mean Squared Error between the true and predicted face Landmarks:

torch.autograd.set_detect_anomaly(True) network = Network() network.cuda() criterion = nn.MSELoss() optimizer = optim.Adam(network.parameters(), lr=0.0001) loss_min = np.inf num_epochs = 10 start_time = time.time() for epoch in range(1,num_epochs+1): loss_train = 0 loss_valid = 0 running_loss = 0 network.train() for step in range(1,len(train_loader)+1): images, landmarks = next(iter(train_loader)) images = images.cuda() landmarks = landmarks.view(landmarks.size(0),-1).cuda() predictions = network(images) # clear all the gradients before calculating them optimizer.zero_grad() # find the loss for the current step loss_train_step = criterion(predictions, landmarks) # calculate the gradients loss_train_step.backward() # update the parameters optimizer.step() loss_train += loss_train_step.item() running_loss = loss_train/step print_overwrite(step, len(train_loader), running_loss, 'train') network.eval() with torch.no_grad(): for step in range(1,len(valid_loader)+1): images, landmarks = next(iter(valid_loader)) images = images.cuda() landmarks = landmarks.view(landmarks.size(0),-1).cuda() predictions = network(images) # find the loss for the current step loss_valid_step = criterion(predictions, landmarks) loss_valid += loss_valid_step.item() running_loss = loss_valid/step print_overwrite(step, len(valid_loader), running_loss, 'valid') loss_train /= len(train_loader) loss_valid /= len(valid_loader) print('\n--------------------------------------------------') print('Epoch: {} Train Loss: {:.4f} Valid Loss: {:.4f}'.format(epoch, loss_train, loss_valid)) print('--------------------------------------------------') if loss_valid < loss_min: loss_min = loss_valid torch.save(network.state_dict(), '/content/face_landmarks.pth') print("\nMinimum Validation Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs)) print('Model Saved\n') print('Training Complete') print("Total Elapsed Time : {} s".format(time.time()-start_time))

Face Landmarks Prediction

Now let’s use the model that we trained above on the unseen images in the dataset:

start_time = time.time() with torch.no_grad(): best_network = Network() best_network.cuda() best_network.load_state_dict(torch.load('/content/face_landmarks.pth')) best_network.eval() images, landmarks = next(iter(valid_loader)) images = images.cuda() landmarks = (landmarks + 0.5) * 224 predictions = (best_network(images).cpu() + 0.5) * 224 predictions = predictions.view(-1,68,2) plt.figure(figsize=(10,40)) for img_num in range(8): plt.subplot(8,1,img_num+1) plt.imshow(images[img_num].cpu().numpy().transpose(1,2,0).squeeze(), cmap='gray') plt.scatter(predictions[img_num,:,0], predictions[img_num,:,1], c = 'r', s = 5) plt.scatter(landmarks[img_num,:,0], landmarks[img_num,:,1], c = 'g', s = 5) print('Total number of test images: {}'.format(len(valid_dataset))) end_time = time.time() print("Elapsed Time : {}".format(end_time - start_time))
Image landmarks

