A dataset is unbalanced if the classes are not represented approximately equally. In this article, I will introduce you to SMOTE in Machine Learning to deal with class imbalance using the Python programming language.
Introduction to Class Imbalance
There have been attempts to deal with imbalanced data sets in areas such as fraudulent phone calls, telecom management, text classification, and oil spill detection in satellite imagery.
The performance of machine learning algorithms is typically evaluated using predictive accuracy. However, this is not appropriate when the data is unbalanced or the costs of different errors vary widely.
The machine learning community has approached the issue of class imbalance in two ways. The first is to assign separate costs to the training examples. The other is to resample the original dataset, either by oversampling the minority class or by subsampling the majority class.
SMOTE for Class Imbalance
Using SMOTE, we propose an oversampling approach in which the minority class is oversampled by creating “synthetic” examples rather than by oversampling with replacement.
Using SMOTE, the minority class is oversampled by taking each minority class sample and introducing synthetic examples with the line segments. For example, if the amount of oversampling needed is 200%, only two neighbours of the five nearest neighbours are chosen and a sample is generated in the direction of each.
SMOTE using Python
Using SMOTE, synthetic samples are generated as follows: Take the difference between the feature vector considered and its nearest neighbour. Multiply this difference by a random number between 0 and 1 and add it to the considered feature vector.
This causes the selection of a random point along the line segment between two specific entities. This approach effectively forces the decision-making region of the minority class to become more general.
Now let’s see how to use SMOTE for dealing with class imbalance with Python programming language. I will start by importing the necessary Python libraries:
Now let’s classify the data before using SMOTE:
Now let’s see what we get after using SMOTE:
I hope you liked this article on SMOTE for dealing with class imbalance with Python programming language. Feel free to ask your valuable questions in the comments section below.