AlexNet Architecture using Python

AlexNet is a popular convolutional neural network architecture that won the ImageNet 2012 challenge by a large margin. It was developed by Alex Krizhevsky, Ilya Sutskever and Geoffery Hinton. It is similar to the LeNet-5 architecture but larger and deeper. If you want to learn more about the AlexNet CNN architecture, this article is for you. In this article, I’ll take you through an introduction to the AlexNet architecture and its implementation using Python.

AlexNet

The AlexNet CNN architecture is one of the popular architectures of Convolutional Neural Networks. It was the first CNN architecture to stack convolutional layers directly on top of one another. The table below describes the complete architecture of the AlexNet CNN architecture:

LayerTypeMapsSizeKernel SizeStridePaddingActivation
OutFully Connected1000Softmax
F10Fully Connected4096ReLU
F9Fully Connected4096ReLU
S8Max pooling2566X63X32valid
C7Convolution25613X133X31sameReLU
C6Convolution38413X133X31sameReLU
C5Convolution38413X133X31sameReLU
S4Max pooling25613X133X32valid
C3Convolution25627X275X51sameReLU
S2Max pooling9627X273X32valid
C1Convolution9655X5511X114validReLU
InInput3(RGB)227X227

This Convolutional Neural Network architecture contains eight layers with weights, where the first five layers are convolutional, and the remaining three are fully-connected layers.

AlexNet Architecture

The first convolutional layer is an input layer that filters 227x227x3 input images with 96 kernels of size 11x11x3 with a stride of 4 pixels. The kernels of the second, fourth, and fifth convolutional layers are connected only to the kernel maps of the previous layer that reside on the same GPU. The kernels of the third convolutional layer are connected to all the kernel maps of the second convolutional layer, and the neurons in the fully-connected layers are connected to all the neurons in the previous layer.

So this is the overall architecture of the AlexNet CNN architecture. You can learn more about this architecture from this research paper. Now in the section below, I will take you through the implementation of AlexNet architecture using Python.

AlexNet Architecture using Python

I hope you have understood the complete architecture of AlexNet CNN architecture. To implement it using Python, we can use the TensorFlow and Keras library in Python. I will also use the VisualKeras library in Python to visualize the architecture of AlexNet. Below is how to implement AlexNet architecture using Python:

import keras
from keras.models import Sequential,Input,Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.advanced_activations import LeakyReLU
import tensorflow as tf
from tensorflow import keras
import keras.layers as layers
model = keras.Sequential()
model.add(layers.Conv2D(filters=96, kernel_size=(11, 11), 
                        strides=(4, 4), activation="relu", 
                        input_shape=(227, 227, 3)))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool2D(pool_size=(3, 3), strides= (2, 2)))
model.add(layers.Conv2D(filters=256, kernel_size=(5, 5), 
                        strides=(1, 1), activation="relu", 
                        padding="same"))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2)))
model.add(layers.Conv2D(filters=384, kernel_size=(3, 3), 
                        strides=(1, 1), activation="relu", 
                        padding="same"))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(filters=384, kernel_size=(3, 3), 
                        strides=(1, 1), activation="relu", 
                        padding="same"))
model.add(layers.BatchNormalization())
model.add(layers.Conv2D(filters=256, kernel_size=(3, 3), 
                        strides=(1, 1), activation="relu", 
                        padding="same"))
model.add(layers.BatchNormalization())
model.add(layers.MaxPool2D(pool_size=(3, 3), strides=(2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(4096, activation="relu"))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(10, activation="softmax"))
model.compile(loss='sparse_categorical_crossentropy', 
              optimizer=tf.optimizers.SGD(lr=0.001), 
              metrics=['accuracy'])
model.summary()
Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 conv2d_23 (Conv2D)          (None, 55, 55, 96)        34944     
                                                                 
 batch_normalization_20 (Bat  (None, 55, 55, 96)       384       
 chNormalization)                                                
                                                                 
 max_pooling2d_12 (MaxPoolin  (None, 27, 27, 96)       0         
 g2D)                                                            
                                                                 
 conv2d_24 (Conv2D)          (None, 27, 27, 256)       614656    
                                                                 
 batch_normalization_21 (Bat  (None, 27, 27, 256)      1024      
 chNormalization)                                                
                                                                 
 max_pooling2d_13 (MaxPoolin  (None, 13, 13, 256)      0         
 g2D)                                                            
                                                                 
 conv2d_25 (Conv2D)          (None, 13, 13, 384)       885120    
                                                                 
 batch_normalization_22 (Bat  (None, 13, 13, 384)      1536      
 chNormalization)                                                
                                                                 
 conv2d_26 (Conv2D)          (None, 13, 13, 384)       1327488   
                                                                 
 batch_normalization_23 (Bat  (None, 13, 13, 384)      1536      
 chNormalization)                                                
                                                                 
 conv2d_27 (Conv2D)          (None, 13, 13, 256)       884992    
                                                                 
 batch_normalization_24 (Bat  (None, 13, 13, 256)      1024      
 chNormalization)                                                
                                                                 
 max_pooling2d_14 (MaxPoolin  (None, 6, 6, 256)        0         
 g2D)                                                            
                                                                 
 flatten_4 (Flatten)         (None, 9216)              0         
                                                                 
 dense_8 (Dense)             (None, 4096)              37752832  
                                                                 
 dropout_4 (Dropout)         (None, 4096)              0         
                                                                 
 dense_9 (Dense)             (None, 10)                40970     
                                                                 
=================================================================
Total params: 41,546,506
Trainable params: 41,543,754
Non-trainable params: 2,752
_________________________________________________________________

Now here is how you can visualize the architecture of your neural network architecture:

import visualkeras
visualkeras.layered_view(model)
AlexNet Architecture

Summary

The AlexNet CNN architecture is one of the popular architectures of Convolutional Neural Networks. It was the first CNN architecture to stack convolutional layers directly on top of one another. I hope you liked this article on an introduction to AlexNet CNN architecture and its implementation using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

Leave a Reply