Humans take no effort to distinguish a dog, cat, or flying saucer. But this process is quite difficult for a computer to emulate: it only looks easy because God designs our brains incredibly well to recognize images. One common example of image recognition with machine learning is optical character recognition. In this article, I will take you through building an Image Recognition model with Machine Learning using PyTorch.
What is PyTorch?
Before diving into this task let’s first understand what is PyTorch. PyTorch is a library for Python programs that make it easy to create deep learning models. Like Python does for programming, PyTorch provides a great introduction to deep learning. At the same time, PyTorch has proven to be fully qualified for use in professional contexts for high-level real-world work.
Image Recognition with Machine Learning
For the image recognition task, in this article, I will be using the TorchVision package which contains some of the best performing neural network architectures for computer vision, such as AlexNet. It also provides easy access to datasets like ImageNet and other utilities to learn about computer vision applications in PyTorch.
The predefined models can be found in torchvision.models:
from torchvision import models dir(models)
['AlexNet', 'DenseNet', 'GoogLeNet', 'GoogLeNetOutputs', 'Inception3', 'InceptionOutputs', 'MNASNet', 'MobileNetV2', 'ResNet', 'ShuffleNetV2', 'SqueezeNet', 'VGG', '_GoogLeNetOutputs', '_InceptionOutputs', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__path__', '__spec__', '_utils', 'alexnet', 'densenet', 'densenet121', 'densenet161', 'densenet169', 'densenet201', 'detection', 'googlenet', 'inception', 'inception_v3', 'mnasnet', 'mnasnet0_5', 'mnasnet0_75', 'mnasnet1_0', 'mnasnet1_3', 'mobilenet', 'mobilenet_v2', 'quantization', 'resnet', 'resnet101', 'resnet152', 'resnet18', 'resnet34', 'resnet50', 'resnext101_32x8d', 'resnext50_32x4d', 'segmentation', 'shufflenet_v2_x0_5', 'shufflenet_v2_x1_0', 'shufflenet_v2_x1_5', 'shufflenet_v2_x2_0', 'shufflenetv2', 'squeezenet', 'squeezenet1_0', 'squeezenet1_1', 'utils', 'vgg', 'vgg11', 'vgg11_bn', 'vgg13', 'vgg13_bn', 'vgg16', 'vgg16_bn', 'vgg19', 'vgg19_bn', 'video', 'wide_resnet101_2', 'wide_resnet50_2']
The uppercase names refer to Python classes that implement several popular models. Lowercase names are handy functions that return patterns instantiated from these classes, sometimes with different sets of parameters.
To run the AlexNet architecture on an input image, we can create an instance of the AlexNet class. Here’s how to do it:
alexnet = models.AlexNet()
At this stage, alexnet is an object that runs the AlexNet architecture. It is not essential for us to understand the details of this architecture at this time. At the moment, AlexNet is just an opaque object that can be called as a function.
By providing alexnet with precisely sized input data, we will perform a direct transfer across the network. In other words, the input will go through the first set of neurons, the outputs of which will be transmitted to the next set of neurons, until the final output.
By using the resnet101 method, we can now instantiate a 101-layer convolutional neural network. Now let’s create an instance of the network. We’re going to pass an argument that will ask the function to download the resnet101 weights formed on the ImageNet dataset, with 1.2 million images and 1000 categories:
resnet = models.resnet101(pretrained=True) resnet
Now, the resnet variable can be called as a function. Before we can do that, however, we need to preprocess the input images so that they are the correct size and their values (colours) are roughly in the same numeric range. To do this, we need to use the torchvision module which provides transformations, which will allow us to quickly define pipelines of basic preprocessing functions:
from torchvision import transforms preprocess = transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225] )])
In this case, we have defined a preprocessing function that will scale the input image to 256 × 256, crop the image to 224 × 224 around the centre, turn it into a tensor, and normalize its RGB components. (red, green, blue) so that they have defined means and standard deviations.
Now we can use an image for the image recognition task using our model. I took a picture of a dog. We can start by loading an image from the local filesystem using Pillow, an image manipulation module for Python:
from google.colab import files uploaded = files.upload() from PIL import Image img = Image.open("dog.png") img
Next, we need to pass the image through our preprocessing pipeline for image recognition:
img_t = preprocess(img)
Now we can reshape, crop, and normalize the input tensor in the way the network expects:
import torch batch_t = torch.unsqueeze(img_t, 0) resnet.eval() out = resnet(batch_t) out
Run The Image Recognition Model
The process of running a trained model on new data is called inference in deep learning circles. In order to make inferences for this image recognition model, we need to put the network into evaluation mode. Now let’s load the file containing the 1,000 labels for the ImageNet dataset classes:
with open('imagenet_classes.txt') as f: labels = [line.strip() for line in f.readlines()] _, index = torch.max(out, 1) percentage = torch.nn.functional.softmax(out, dim=1) * 100 labels[index], percentage[index].item()
(‘golden retriever’, 96.29334259033203)
This gives us something that roughly resembles the confidence the model has in its prediction. In this case, the model is 96% certain that he knows what he is looking at is a golden retriever.
I hope you liked this article on Image Recognition with Machine Learning using PyTorch. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.