Feature Scaling in Machine Learning

Feature Scaling means resizing features so that no feature dominates other features. In machine learning, we use the concept of feature scaling to make sure that all the features we use to train a machine learning model are at a similar scale. In this article, I will introduce you to the concept of feature scaling in machine learning and its implementation using Python.

Feature Scaling

Rescaling the features is one of the most important transformations that we need to apply to features before training a machine learning model, because not all machine learning algorithms work well if the input features or input values are not distributed on a similar scale.

In machine learning, there are two common ways to rescale the features:

  1. Normalization
  2. Standardization

During normalization, the values are shifted and resized so that they end up being between o and 1. The standardization method first subtracts the mean value and then divides it by standard deviation so that the resulting distribution of the features has a mean as 0 and standard deviation as 1.

One thing you should always remember when using a feature scaling method is that it is important to only fit the training data and not the entire data set when resizing the features.

Feature Scaling using Python

So there are two common methods of scaling features in machine learning MinMaxScaler for normalization and StandardScaler for standardization. The difference between these two methods is that normalization rescales the data so that we end up having values between 0 and 1, and standardization rescales the data so that the mean value becomes 0 and the standard deviation becomes 1.

Now let’s see how to rescale the features using Python. The Scikit-learn library in Python provides all the methods you can use on the training dataset after you split the data into training and test sets. Suppose you’re ready with the training set, here’s how to implement feature scaling using Python:

Summary

You don’t need to use both standardization and normalization at the same time for rescaling the data, you need to use normalization when the dataset does not follow a normal distribution, and you need to use standardization when the dataset follows a normal distribution. I hope you liked this article on feature scaling in machine learning and its implementation using Python. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1535

Leave a Reply