Face Landmarks Detection

Have you ever thought how Snapchat manage to apply amazing filters according to your face? It have been programmed to detect some marks on your face to project a filter according to those marks. In Machine Learning those marks are known as Face Landmarks. In this article I will guide you how you can detect face Landmarks with Machine Learning.

Now, I will simply start with importing all the libraries we need for this task. I will use PyTorch in this article to face landmarks detection with Deep Learning. Let’s import all the libraries:

import time
import cv2
import os
import random
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import imutils
import matplotlib.image as mpimg
from collections import OrderedDict
from skimage import io, transform
from math import *
import xml.etree.ElementTree as ET 

import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms.functional as TF
from torchvision import datasets, models, transforms
from torch.utils.data import Dataset
from torch.utils.data import DataLoaderCode language: Python (python)

Download the DLIB Dataset

The dataset I will choose here to detect Face Landmarks in an official DLIB dataset which consists of over 6666 images of different dimensions. The code below will download the dataset and unzip for further exploration:

if not os.path.exists('/content/ibug_300W_large_face_landmark_dataset'):
    !wget http://dlib.net/files/data/ibug_300W_large_face_landmark_dataset.tar.gz
    !tar -xvzf 'ibug_300W_large_face_landmark_dataset.tar.gz'    
    !rm -r 'ibug_300W_large_face_landmark_dataset.tar.gz'Code language: JavaScript (javascript)

Visualize the Dataset

Now, let’s have a look at what we are working with, to see all the data cleaning and preprocessing opportunities that we need to go through. Here is an example of an image from the dataset we have taken for this task.

file = open('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.pts')
points = file.readlines()[3:-1]

landmarks = []

for point in points:
    x,y = point.split(' ')
    landmarks.append([floor(float(x)), floor(float(y[:-1]))])

landmarks = np.array(landmarks)

plt.scatter(landmarks[:,0], landmarks[:,1], s = 5, c = 'g')
plt.show()Code language: Python (python)

face landmarks

You can see that the face covers very less amount of space in the image. If we will use this image in the neural network it will take the background also. So like we prepare a text data we will prepare this image dataset for further exploration.

Creating Dataset Classes

Now Let’s dig deeper into the classes and labels in the dataset. The labels_ibug_300W_train.xml consists of the input images and landmarks and bounding box to crop the face. I will store all these values in the list so that we could easily access them during the training process.

class Transforms():
    def __init__(self):
    def rotate(self, image, landmarks, angle):
        angle = random.uniform(-angle, +angle)

        transformation_matrix = torch.tensor([
            [+cos(radians(angle)), -sin(radians(angle))], 
            [+sin(radians(angle)), +cos(radians(angle))]

        image = imutils.rotate(np.array(image), angle)

        landmarks = landmarks - 0.5
        new_landmarks = np.matmul(landmarks, transformation_matrix)
        new_landmarks = new_landmarks + 0.5
        return Image.fromarray(image), new_landmarks

    def resize(self, image, landmarks, img_size):
        image = TF.resize(image, img_size)
        return image, landmarks

    def color_jitter(self, image, landmarks):
        color_jitter = transforms.ColorJitter(brightness=0.3, 
        image = color_jitter(image)
        return image, landmarks

    def crop_face(self, image, landmarks, crops):
        left = int(crops['left'])
        top = int(crops['top'])
        width = int(crops['width'])
        height = int(crops['height'])

        image = TF.crop(image, top, left, height, width)

        img_shape = np.array(image).shape
        landmarks = torch.tensor(landmarks) - torch.tensor([[left, top]])
        landmarks = landmarks / torch.tensor([img_shape[1], img_shape[0]])
        return image, landmarks

    def __call__(self, image, landmarks, crops):
        image = Image.fromarray(image)
        image, landmarks = self.crop_face(image, landmarks, crops)
        image, landmarks = self.resize(image, landmarks, (224, 224))
        image, landmarks = self.color_jitter(image, landmarks)
        image, landmarks = self.rotate(image, landmarks, angle=10)
        image = TF.to_tensor(image)
        image = TF.normalize(image, [0.5], [0.5])
        return image, landmarksCode language: Python (python)
class FaceLandmarksDataset(Dataset):

    def __init__(self, transform=None):

        tree = ET.parse('ibug_300W_large_face_landmark_dataset/labels_ibug_300W_train.xml')
        root = tree.getroot()

        self.image_filenames = []
        self.landmarks = []
        self.crops = []
        self.transform = transform
        self.root_dir = 'ibug_300W_large_face_landmark_dataset'
        for filename in root[2]:
            self.image_filenames.append(os.path.join(self.root_dir, filename.attrib['file']))


            landmark = []
            for num in range(68):
                x_coordinate = int(filename[0][num].attrib['x'])
                y_coordinate = int(filename[0][num].attrib['y'])
                landmark.append([x_coordinate, y_coordinate])

        self.landmarks = np.array(self.landmarks).astype('float32')     

        assert len(self.image_filenames) == len(self.landmarks)

    def __len__(self):
        return len(self.image_filenames)

    def __getitem__(self, index):
        image = cv2.imread(self.image_filenames[index], 0)
        landmarks = self.landmarks[index]
        if self.transform:
            image, landmarks = self.transform(image, landmarks, self.crops[index])

        landmarks = landmarks - 0.5

        return image, landmarks

dataset = FaceLandmarksDataset(Transforms())Code language: Python (python)

Visualize Train Transforms:

Now let’s have a quick look at what we have done until now. I will just visualize the dataset by performing the transformation that the above classes will provide to the dataset:

image, landmarks = dataset[0]
landmarks = (landmarks + 0.5) * 224
plt.figure(figsize=(10, 10))
plt.imshow(image.numpy().squeeze(), cmap='gray');
plt.scatter(landmarks[:,0], landmarks[:,1], s=8);Code language: Python (python)

Split the Dataset for Training and Prediction of Face Landmarks

Now, to move further, I will split the dataset into a train and a valid dataset:

# split the dataset into validation and test sets
len_valid_set = int(0.1*len(dataset))
len_train_set = len(dataset) - len_valid_set

print("The length of Train set is {}".format(len_train_set))
print("The length of Valid set is {}".format(len_valid_set))

train_dataset , valid_dataset,  = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set])

# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)Code language: Python (python)

The length of Train set is 6000
The length of Valid set is 666

Testing the shape of input data:

images, landmarks = next(iter(train_loader))

print(landmarks.shape)Code language: Python (python)

torch.Size([64, 1, 224, 224])
torch.Size([64, 68, 2])

Define the Face Landmarks Detection Model

Now I will use the ResNet18 as our fundamental framework. I will modify the first and last layers so that the layers will fit easily for our purpose:

class Network(nn.Module):
    def __init__(self,num_classes=136):
        self.model.conv1=nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.model.fc=nn.Linear(self.model.fc.in_features, num_classes)
    def forward(self, x):
        return xCode language: Python (python)

Helper Functions:

import sys

def print_overwrite(step, total_step, loss, operation):
    if operation == 'train':
        sys.stdout.write("Train Steps: %d/%d  Loss: %.4f " % (step, total_step, loss))   
        sys.stdout.write("Valid Steps: %d/%d  Loss: %.4f " % (step, total_step, loss))
    sys.stdout.flush()Code language: Python (python)

Training the Neural Network for Face Landmarks Detection

I will now use the Mean Squared Error between the true and predicted face Landmarks:

network = Network()

criterion = nn.MSELoss()
optimizer = optim.Adam(network.parameters(), lr=0.0001)

loss_min = np.inf
num_epochs = 10

start_time = time.time()
for epoch in range(1,num_epochs+1):
    loss_train = 0
    loss_valid = 0
    running_loss = 0
    for step in range(1,len(train_loader)+1):
        images, landmarks = next(iter(train_loader))
        images = images.cuda()
        landmarks = landmarks.view(landmarks.size(0),-1).cuda() 
        predictions = network(images)
        # clear all the gradients before calculating them
        # find the loss for the current step
        loss_train_step = criterion(predictions, landmarks)
        # calculate the gradients
        # update the parameters
        loss_train += loss_train_step.item()
        running_loss = loss_train/step
        print_overwrite(step, len(train_loader), running_loss, 'train')
    with torch.no_grad():
        for step in range(1,len(valid_loader)+1):
            images, landmarks = next(iter(valid_loader))
            images = images.cuda()
            landmarks = landmarks.view(landmarks.size(0),-1).cuda()
            predictions = network(images)

            # find the loss for the current step
            loss_valid_step = criterion(predictions, landmarks)

            loss_valid += loss_valid_step.item()
            running_loss = loss_valid/step

            print_overwrite(step, len(valid_loader), running_loss, 'valid')
    loss_train /= len(train_loader)
    loss_valid /= len(valid_loader)
    print('Epoch: {}  Train Loss: {:.4f}  Valid Loss: {:.4f}'.format(epoch, loss_train, loss_valid))
    if loss_valid < loss_min:
        loss_min = loss_valid
        torch.save(network.state_dict(), '/content/face_landmarks.pth') 
        print("\nMinimum Validation Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs))
        print('Model Saved\n')
print('Training Complete')
print("Total Elapsed Time : {} s".format(time.time()-start_time))Code language: Python (python)

Face Landmarks Prediction

Now let’s use the model that we trained above on the unseen images in the dataset:

Also Read: 10 Machine Learning Projects to Boost your Portfolio.

start_time = time.time()

with torch.no_grad():

    best_network = Network()
    images, landmarks = next(iter(valid_loader))
    images = images.cuda()
    landmarks = (landmarks + 0.5) * 224

    predictions = (best_network(images).cpu() + 0.5) * 224
    predictions = predictions.view(-1,68,2)
    for img_num in range(8):
        plt.imshow(images[img_num].cpu().numpy().transpose(1,2,0).squeeze(), cmap='gray')
        plt.scatter(predictions[img_num,:,0], predictions[img_num,:,1], c = 'r', s = 5)
        plt.scatter(landmarks[img_num,:,0], landmarks[img_num,:,1], c = 'g', s = 5)

print('Total number of test images: {}'.format(len(valid_dataset)))

end_time = time.time()
print("Elapsed Time : {}".format(end_time - start_time)) Code language: Python (python)
Image landmarks

Also Read: Image Segmentation with Deep Learning.

I hope you liked this article, Feel free to ask you valuable questions in the comments section below. Don’t forget to subscribe to my daily newsletters if you like my works.

Follow Us:

Articles: 75

Leave a Reply