Landmark Detection with Machine Learning

Have you ever looked through your vacation photos and wondered: what is the name of this temple I visited in India? Who created this monument that I saw in California? Landmark Detection can help us detect the names of these places. But how does the detection of landmarks work? In this article, I will introduce you to a machine learning project on landmark detection with Python.

What is Landmark Detection?

Landmark Detection is a task of detecting popular man-made sculptures, structures, and monuments within an image. We already have a very famous application for such tasks which is popularly known as the Google Landmark Detection, which is used by Google Maps.

Also, Read – Machine Learning Full Course for free.

At the end of this article, you will learn how google landmark detection works as I will take you through a machine learning project which is based on the functionality of Google Landmark Detection. I will use the Python programming language for building neural networks to detect landmarks within images.

Now let’s get started with the task of detecting landmarks within an image. The most challenging task in this project is to find a dataset that includes some images that we can use to train our neural network. 

Hopefully, after a lot of research, I came across a dataset provided by Google in the Kaggle competitions. You can download the dataset, that I will use to detect landmarks using Machine Learning from here.

Google Landmark Detection with Machine Learning

Now to get started with this task, I will import all the necessary python libraries that we need to create a Machine Learning model for the task of landmark detection:

So after importing the above libraries the next step in this task is to import the dataset that I will use for detecting landmarks withing images:

Now let’s have a look at the size of the training data and the number of unique classes in the training data:

Size of training data: (20001, 2)
Number of unique classes: 1020

There are 20,001 training samples, belonging to 1,020 classes, which gives us an average of 19.6 images per class, however, this distribution might not be the case, so let’s look at the distribution of samples by class:

landmark_id  count
0 1924 944
1 27 504
2 454 254
3 1346 244
4 1127 201
5 870 193
6 2185 177
7 1101 162
8 389 140
9 219 139
landmark_id count
1010 499 2
1011 1942 2
1012 875 2
1013 2297 2
1014 611 2
1015 1449 2
1016 1838 2
1017 604 2
1018 374 2
1019 991 2

As we can see, the 10 most frequent landmarks range from 139 data points to 944 data points while the last 10 all have 2 data points.

count    1020.000000
mean 19.608824
std 41.653684
min 2.000000
25% 5.000000
50% 9.000000
75% 21.000000
max 944.000000
Name: count, dtype: float64
Text(0, 0.5, 'Occurences')
histogram for landmark recognition

As we can see in the histogram above, the vast majority of classes are not associated with so many images.

Amount of classes with five and less datapoints: 322
Amount of classes with with between five and 10 datapoints: 342
Text(0, 0.5, 'Number of images')
number of landmarks ID and images

The graph above shows that over 50% of the 1020 classes have less than 10 images, which can be difficult when training a classifier.

There are some “outliers” in terms of the number of images they have, which means we might be biased towards those, as there might have a higher chance of getting a correct “guess” with the highest amount in these classes.

Training Model:

Now, I will train the Machine Learning model for the task of landmark detection using the Python programming language which will work the same as the Google landmark detection model.

random landmarks images
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
batch_normalization (BatchNo (None, 224, 224, 3) 12
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv4 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv4 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv4 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
dense (Dense) (None, 1020) 4178940
=================================================================
Total params: 143,749,192
Trainable params: 143,749,186
Non-trainable params: 6
Training on: 16000 samples
Validation on: 4001 samples
Epoch: 1/15
Epoch: 2/15
Epoch: 3/15
Epoch: 4/15
Epoch: 5/15
Epoch: 6/15
Epoch: 7/15
Epoch: 8/15
Epoch: 9/15
Epoch: 10/15
Epoch: 11/15
Epoch: 12/15
Epoch: 13/15
Epoch: 14/15
Epoch: 15/15

Now we have trained the model successfully. The next step is to test the model, let’s see how we can test our landmark detection model:

google landmark detection

As you can see in the above images in the output, they are being classified according to their labels and classes. I hope you liked this article on Machine Learning project on Google landmark detection with Python programming language. Feel free to ask your valuable questions in the comments section below.

Thecleverprogrammer
Thecleverprogrammer
Articles: 76

8 Comments

  1. Aman Kharwal can you please explain about sample-submission file in data set what it is about ,and the remaining two files test,train contains url for images but for the sample-submission file it contains two columns id,landmark

Leave a Reply