# Gradient Descent Algorithm in Machine Learning

Gradient Descent is an optimization algorithm used to train a machine learning model differently. It is best suited for problems where there are a large number of features and too many samples to fit in the memory of a machine learning model. In this article, I will introduce you to the Gradient Descent algorithm in Machine Learning and its implementation using Python.

In Machine Learning, Gradient Descent is an optimization algorithm capable of finding the most favourable solutions to a wide range of problems. It works by iterating the parameter tuning to minimize the cost function.

Also, Read – 200+ Machine Learning Projects Solved and Explained.

The most important parameter of this algorithm is the size of the steps which is determined by the learning rate. If the learning rate is very low, the algorithm will undergo many iterations to converge, which will ultimately take a very long time.

But if the learning rate is very high, it will split the algorithm with larger values, which will help to find a good solution. When using a gradient descent algorithm, you need to make sure that all the features have a similar scale, otherwise, it will take a long time to converge.

## Types Of Gradient Descent Algorithm

To implement the gradient descent algorithm, we need to calculate the gradient of the cost function which leads to three types of gradient descent algorithms that differ in the amount of data used to calculate the gradient of the cost function.

It uses the whole batch of data at every step of the iteration, as a result, it is slow on a large training data. This is preferable only when you want to process all the instances in the training set, the only limitation is that if the training set is very large then it is very slow to use. Let’s see how to implement it using Python:

So, as mentioned above, batch gradient descent is very slow to implement because it uses all the training data to calculate the gradients at each step of the iteration. This limitation is overcome by using the stochastic gradient descent algorithm.

SGD retrieves a random instance of training set at each step of the iteration and calculates the gradients based on that random instance. So, working on a single instance makes it faster. Now let’s see how to implement it using Python:

The final type of gradient descent algorithm is mini-batch gradient descent. It calculates gradients over small random sets of instances known as mini-batches instead of calculating the training data set or a single random instance. Now let’s see how to implement it using Python:

## Summary

The Gradient Descent is therefore a generic optimization method capable of finding the most favourable solutions to a wide range of problems. For example, suppose you are lost in the mountains on a dense, foggy day, and at this point, you can only feel the slope of the ground beneath your feet. At this point, a good strategy for reaching the valley floor is to descend in the direction of the steepest slope. This is how the Gradient Descent algorithms also work. 