Image Processing with Machine Learning and Python

Using the HOG features of Machine Learning, we can build up a simple facial detection algorithm with any Image processing estimator, here we will use a linear support vector machine, and it’s steps are as follows:

  1. Obtain a set of image thumbnails of faces to constitute “positive” training samples.
  2. Obtain a set of image thumbnails of nonfaces to constitute “negative” training samples.
  3. Extract HOG features from these training samples.
  4. Train a linear SVM classifier on these samples.
  5. For an “unknown” image, pass a sliding window across the image, using the model to evaluate whether that window contains a face or not.
  6. If detections overlap, combine them into a single window.

Let’s go through these steps and try it out:

Also, read – 10 Machine Learning Projects to Boost your Portfolio

Obtain a set of positive training samples:

Let’s start by finding some positive training samples for Image processing, that show a variety of faces. We have one easy set of data to work with, the Labeled Faces in the Wild dataset, which can be downloaded by Scikit-Learn:

import numpy as np
from sklearn.datasets import fetch_lfw_people
from skimage import data, color, transform, feature
faces = fetch_lfw_people()
positive_patches = faces.images
print(positive_patches.shape)Code language: Python (python)

#Output- (3185, 62, 47)

This gives us a sample of more 13,000 face images to use for training.

Obtain a set of negative training samples:

Next we need a set of similarly sized thumbnails that do not have a face in them. One way to do this is to take any corpus of input images, and extract thumbnails from them at a variety of scales. Here we can use some of the images shipped with Scikit-Image, along with Scikit-Learn’s PatchExtractor:

from skimage import data, transform
imgs_to_use = ['camera', 'text', 'coins', 'moon',
                      'page', 'clock', 'immunohistochemistry',
                      'chelsea', 'coffee', 'hubble_deep_field']
images = [color.rgb2gray(getattr(data, name)())
          for name in imgs_to_use]Code language: Python (python)
from sklearn.feature_extraction.image import PatchExtractor
def extract_patches(img, N, scale=1.0,
    extracted_patch_size = \
        tuple((scale * np.array(patch_size)).astype(int))
    extractor = PatchExtractor(patch_size=extracted_patch_size,
                               max_patches=N, random_state=0)
    patches = extractor.transform(img[np.newaxis])
    if scale != 1:
        patches = np.array([transform.resize(patch, patch_size)
                            for patch in patches])
    return patches
negative_patches = np.vstack([extract_patches(im, 1000, scale)
                              for im in images for scale in [0.5, 1.0, 2.0]])
print(negative_patches.shape)Code language: Python (python)

#Output- (30000, 62, 47)

We now have 30,000 suitable image patches that do not contain faces. Let’s take a look at a few of them to get an idea of what they look like:

import matplotlib.pyplot as plt
fig, ax = plt.subplots(6, 10)
for i, axi in enumerate(ax.flat):
    axi.imshow(negative_patches[500 * i], cmap='gray')
    axi.axis('off') language: Python (python)
image processing

My hope is that these would sufficiently cover the space of “nonfaces” that our algorithm is likely to see.

Combine sets and extract HOG features for Image Processing:

Now that we have these positive samples and negative samples, we can combine them and compute HOG features. This step takes a little while, because the HOG features involve a nontrivial computation for each image:

from itertools import chain
X_train = np.array([feature.hog(im)
                    for im in chain(positive_patches,
y_train = np.zeros(X_train.shape[0])
y_train[:positive_patches.shape[0]] = 1
print(X_train.shape)Code language: Python (python)

#Output- (33185, 1215)

We are left with 33,185 training samples in 1,215 dimensions, and we now have our data in a form that we can feed into Scikit-Learn.

Train a support vector machine for Image Processing :

Next we use the tools to create a classifier of thumbnail patches. For such a high-dimensional binary classification task, a linear support vector machine is a good choice. We will use Scikit-Learn’s Linear SVC, because in comparison to SVC it often has better scaling for large number of samples.

First, though, let’s use a simple Gaussian naive Bayes to get a quick baseline:

from sklearn.naive_bayes import GaussianNB       
from sklearn.model_selection import cross_val_score
cross_val_score(GaussianNB(), X_train, y_train)Code language: Python (python)

#Output- array([0.96112702, 0.986741 , 0.98900105, 0.99261715, 0.98885038])

We see that on our training data, even a simple naive Bayes algorithm gets us upward of 90% accuracy. Let’s try the support vector machine, with a grid search over a few choices of the C parameter:

from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV
grid = GridSearchCV(LinearSVC(), {'C': [1.0, 2.0, 4.0, 8.0]}), y_train)
grid.best_score_Code language: Python (python)

#Output- 0.9934910351062227

grid.best_params_Code language: Python (python)

#Output- {‘C’: 1.0}

Let’s take the best estimator and retrain it on the full dataset:

model = grid.best_estimator_, y_train)
#OutputCode language: Python (python)

Find faces in a new image:

Now that we have this model in place, let’s grab a new image and see how the model does. We will use one portion of the astronaut image for simplicity and run a sliding window over it and evaluate each patch.

import skimage
test_image =        
test_image = skimage.color.rgb2gray(test_image)        
test_image = skimage.transform.rescale(test_image, 0.5)        
test_image = test_image[:160, 40:180]
plt.imshow(test_image, cmap='gray')        
plt.axis('off') language: Python (python)

Next, let’s create a window that iterates over patches of this image, and compute HOG features for each patch:

def sliding_window(img, 
                   istep=2, jstep=2, scale=1.0):
    Ni, Nj = (int(scale * s) for s in patch_size)
    for i in range(0, img.shape[0] - Ni, istep):
      for j in range(0, img.shape[1] - Ni, jstep):
        patch = img[i:i + Ni, j:j + Nj]
        if scale != 1:
            patch = transform.resize(patch, patch_size)
        yield (i, j), patch
indices, patches = zip(*sliding_window(test_image))
patches_hog = np.array([feature.hog(patch) for patch in patches])
patches_hog.shapeCode language: Python (python)

#Output- (1911, 1215)

Finally, we can take these HOG-featured patches and use our model to evaluate whether each patch contains a face:

labels = model.predict(patches_hog)
labels.sum()Code language: Python (python)

#output- 36.0

We see that out of nearly 2,000 patches, we have found 36 detections. Let’s use the information we have about these patches to show where they lie on our test image, drawing them as rectangles:

fig, ax = plt.subplots()
ax.imshow(test_image, cmap='gray')
Ni, Nj = positive_patches[0].shape
indices = np.array(indices)
for i, j in indices[labels == 1]:            
    ax.add_patch(plt.Rectangle((j, i), 
                               Nj, Ni, 
                              alpha=0.4, lw=3,                                       
                              facecolor='none')) language: Python (python)
image processing

All of the detected patches overlap and found the face in the image! Not bad for a few lines of Python. I hope you liked this article on Image Processing. Feel free to ask your valuable questions in the comments section below.

Also, read – Understanding a Neural Network

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1538

Leave a Reply