Have you ever thought how Snapchat manage to apply amazing filters according to your face? It have been programmed to detect some marks on your face to project a filter according to those marks. In Machine Learning those marks are known as Face Landmarks. In this article I will guide you how you can detect face Landmarks with Machine Learning.
Now, I will simply start with importing all the libraries we need for this task. I will use PyTorch in this article to face landmarks detection with Deep Learning. Let’s import all the libraries:
import time
import cv2
import os
import random
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import imutils
import matplotlib.image as mpimg
from collections import OrderedDict
from skimage import io, transform
from math import *
import xml.etree.ElementTree as ET
import torch
import torchvision
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torchvision.transforms.functional as TF
from torchvision import datasets, models, transforms
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
Code language: Python (python)
Download the DLIB Dataset
The dataset I will choose here to detect Face Landmarks in an official DLIB dataset which consists of over 6666 images of different dimensions. The code below will download the dataset and unzip for further exploration:
%%capture
if not os.path.exists('/content/ibug_300W_large_face_landmark_dataset'):
!wget http://dlib.net/files/data/ibug_300W_large_face_landmark_dataset.tar.gz
!tar -xvzf 'ibug_300W_large_face_landmark_dataset.tar.gz'
!rm -r 'ibug_300W_large_face_landmark_dataset.tar.gz'
Code language: JavaScript (javascript)
Visualize the Dataset
Now, let’s have a look at what we are working with, to see all the data cleaning and preprocessing opportunities that we need to go through. Here is an example of an image from the dataset we have taken for this task.
file = open('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.pts')
points = file.readlines()[3:-1]
landmarks = []
for point in points:
x,y = point.split(' ')
landmarks.append([floor(float(x)), floor(float(y[:-1]))])
landmarks = np.array(landmarks)
plt.figure(figsize=(10,10))
plt.imshow(mpimg.imread('ibug_300W_large_face_landmark_dataset/helen/trainset/100032540_1.jpg'))
plt.scatter(landmarks[:,0], landmarks[:,1], s = 5, c = 'g')
plt.show()
Code language: Python (python)

You can see that the face covers very less amount of space in the image. If we will use this image in the neural network it will take the background also. So like we prepare a text data we will prepare this image dataset for further exploration.
Creating Dataset Classes
Now Let’s dig deeper into the classes and labels in the dataset. The labels_ibug_300W_train.xml consists of the input images and landmarks and bounding box to crop the face. I will store all these values in the list so that we could easily access them during the training process.
class Transforms():
def __init__(self):
pass
def rotate(self, image, landmarks, angle):
angle = random.uniform(-angle, +angle)
transformation_matrix = torch.tensor([
[+cos(radians(angle)), -sin(radians(angle))],
[+sin(radians(angle)), +cos(radians(angle))]
])
image = imutils.rotate(np.array(image), angle)
landmarks = landmarks - 0.5
new_landmarks = np.matmul(landmarks, transformation_matrix)
new_landmarks = new_landmarks + 0.5
return Image.fromarray(image), new_landmarks
def resize(self, image, landmarks, img_size):
image = TF.resize(image, img_size)
return image, landmarks
def color_jitter(self, image, landmarks):
color_jitter = transforms.ColorJitter(brightness=0.3,
contrast=0.3,
saturation=0.3,
hue=0.1)
image = color_jitter(image)
return image, landmarks
def crop_face(self, image, landmarks, crops):
left = int(crops['left'])
top = int(crops['top'])
width = int(crops['width'])
height = int(crops['height'])
image = TF.crop(image, top, left, height, width)
img_shape = np.array(image).shape
landmarks = torch.tensor(landmarks) - torch.tensor([[left, top]])
landmarks = landmarks / torch.tensor([img_shape[1], img_shape[0]])
return image, landmarks
def __call__(self, image, landmarks, crops):
image = Image.fromarray(image)
image, landmarks = self.crop_face(image, landmarks, crops)
image, landmarks = self.resize(image, landmarks, (224, 224))
image, landmarks = self.color_jitter(image, landmarks)
image, landmarks = self.rotate(image, landmarks, angle=10)
image = TF.to_tensor(image)
image = TF.normalize(image, [0.5], [0.5])
return image, landmarks
Code language: Python (python)
class FaceLandmarksDataset(Dataset):
def __init__(self, transform=None):
tree = ET.parse('ibug_300W_large_face_landmark_dataset/labels_ibug_300W_train.xml')
root = tree.getroot()
self.image_filenames = []
self.landmarks = []
self.crops = []
self.transform = transform
self.root_dir = 'ibug_300W_large_face_landmark_dataset'
for filename in root[2]:
self.image_filenames.append(os.path.join(self.root_dir, filename.attrib['file']))
self.crops.append(filename[0].attrib)
landmark = []
for num in range(68):
x_coordinate = int(filename[0][num].attrib['x'])
y_coordinate = int(filename[0][num].attrib['y'])
landmark.append([x_coordinate, y_coordinate])
self.landmarks.append(landmark)
self.landmarks = np.array(self.landmarks).astype('float32')
assert len(self.image_filenames) == len(self.landmarks)
def __len__(self):
return len(self.image_filenames)
def __getitem__(self, index):
image = cv2.imread(self.image_filenames[index], 0)
landmarks = self.landmarks[index]
if self.transform:
image, landmarks = self.transform(image, landmarks, self.crops[index])
landmarks = landmarks - 0.5
return image, landmarks
dataset = FaceLandmarksDataset(Transforms())
Code language: Python (python)
Visualize Train Transforms:
Now let’s have a quick look at what we have done until now. I will just visualize the dataset by performing the transformation that the above classes will provide to the dataset:
image, landmarks = dataset[0]
landmarks = (landmarks + 0.5) * 224
plt.figure(figsize=(10, 10))
plt.imshow(image.numpy().squeeze(), cmap='gray');
plt.scatter(landmarks[:,0], landmarks[:,1], s=8);
Code language: Python (python)

Split the Dataset for Training and Prediction of Face Landmarks
Now, to move further, I will split the dataset into a train and a valid dataset:
# split the dataset into validation and test sets
len_valid_set = int(0.1*len(dataset))
len_train_set = len(dataset) - len_valid_set
print("The length of Train set is {}".format(len_train_set))
print("The length of Valid set is {}".format(len_valid_set))
train_dataset , valid_dataset, = torch.utils.data.random_split(dataset , [len_train_set, len_valid_set])
# shuffle and batch the datasets
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True, num_workers=4)
valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=8, shuffle=True, num_workers=4)
Code language: Python (python)
The length of Train set is 6000
The length of Valid set is 666
Testing the shape of input data:
images, landmarks = next(iter(train_loader))
print(images.shape)
print(landmarks.shape)
Code language: Python (python)
torch.Size([64, 1, 224, 224])
torch.Size([64, 68, 2])
Define the Face Landmarks Detection Model
Now I will use the ResNet18 as our fundamental framework. I will modify the first and last layers so that the layers will fit easily for our purpose:
class Network(nn.Module):
def __init__(self,num_classes=136):
super().__init__()
self.model_name='resnet18'
self.model=models.resnet18()
self.model.conv1=nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.model.fc=nn.Linear(self.model.fc.in_features, num_classes)
def forward(self, x):
x=self.model(x)
return x
Code language: Python (python)
Helper Functions:
import sys
def print_overwrite(step, total_step, loss, operation):
sys.stdout.write('\r')
if operation == 'train':
sys.stdout.write("Train Steps: %d/%d Loss: %.4f " % (step, total_step, loss))
else:
sys.stdout.write("Valid Steps: %d/%d Loss: %.4f " % (step, total_step, loss))
sys.stdout.flush()
Code language: Python (python)
Training the Neural Network for Face Landmarks Detection
I will now use the Mean Squared Error between the true and predicted face Landmarks:
torch.autograd.set_detect_anomaly(True)
network = Network()
network.cuda()
criterion = nn.MSELoss()
optimizer = optim.Adam(network.parameters(), lr=0.0001)
loss_min = np.inf
num_epochs = 10
start_time = time.time()
for epoch in range(1,num_epochs+1):
loss_train = 0
loss_valid = 0
running_loss = 0
network.train()
for step in range(1,len(train_loader)+1):
images, landmarks = next(iter(train_loader))
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
predictions = network(images)
# clear all the gradients before calculating them
optimizer.zero_grad()
# find the loss for the current step
loss_train_step = criterion(predictions, landmarks)
# calculate the gradients
loss_train_step.backward()
# update the parameters
optimizer.step()
loss_train += loss_train_step.item()
running_loss = loss_train/step
print_overwrite(step, len(train_loader), running_loss, 'train')
network.eval()
with torch.no_grad():
for step in range(1,len(valid_loader)+1):
images, landmarks = next(iter(valid_loader))
images = images.cuda()
landmarks = landmarks.view(landmarks.size(0),-1).cuda()
predictions = network(images)
# find the loss for the current step
loss_valid_step = criterion(predictions, landmarks)
loss_valid += loss_valid_step.item()
running_loss = loss_valid/step
print_overwrite(step, len(valid_loader), running_loss, 'valid')
loss_train /= len(train_loader)
loss_valid /= len(valid_loader)
print('\n--------------------------------------------------')
print('Epoch: {} Train Loss: {:.4f} Valid Loss: {:.4f}'.format(epoch, loss_train, loss_valid))
print('--------------------------------------------------')
if loss_valid < loss_min:
loss_min = loss_valid
torch.save(network.state_dict(), '/content/face_landmarks.pth')
print("\nMinimum Validation Loss of {:.4f} at epoch {}/{}".format(loss_min, epoch, num_epochs))
print('Model Saved\n')
print('Training Complete')
print("Total Elapsed Time : {} s".format(time.time()-start_time))
Code language: Python (python)
Face Landmarks Prediction
Now let’s use the model that we trained above on the unseen images in the dataset:
Also Read: 10 Machine Learning Projects to Boost your Portfolio.
start_time = time.time()
with torch.no_grad():
best_network = Network()
best_network.cuda()
best_network.load_state_dict(torch.load('/content/face_landmarks.pth'))
best_network.eval()
images, landmarks = next(iter(valid_loader))
images = images.cuda()
landmarks = (landmarks + 0.5) * 224
predictions = (best_network(images).cpu() + 0.5) * 224
predictions = predictions.view(-1,68,2)
plt.figure(figsize=(10,40))
for img_num in range(8):
plt.subplot(8,1,img_num+1)
plt.imshow(images[img_num].cpu().numpy().transpose(1,2,0).squeeze(), cmap='gray')
plt.scatter(predictions[img_num,:,0], predictions[img_num,:,1], c = 'r', s = 5)
plt.scatter(landmarks[img_num,:,0], landmarks[img_num,:,1], c = 'g', s = 5)
print('Total number of test images: {}'.format(len(valid_dataset)))
end_time = time.time()
print("Elapsed Time : {}".format(end_time - start_time))
Code language: Python (python)

Also Read: Image Segmentation with Deep Learning.
I hope you liked this article, Feel free to ask you valuable questions in the comments section below. Don’t forget to subscribe to my daily newsletters if you like my works.