AdaBoost Algorithm

AdaBoost Algorithm is a boosting method that works by combining weak learners into strong learners. A good way for a prediction model to correct its predecessor is to give more attention to the training samples where the predecessor did not fit well.

This can result in a new prediction model which will focus much on the hard instances. This technique is used by an AdaBoost Algorithm. In this article, I will take you through the AdaBoost Algorithm in Machine Learning.

Training a Base Classifier

To use an AdaBoost classification algorithm, we first need to train a base classification model. So, to explain this algorithm, I will first train a Decision Tree algorithm as our base classification model. I will start by importing the necessary packages to train a DecisionTreeClassifier:

import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)Code language: Python (python)

Now I will use the Decision Tree algorithm to train a base classification:

from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)Code language: Python (python)

AdaBoost Algorithm in Machine Learning

The AdaBoost Algorithm increases the relative weight of less classified training samples. Then it trains another classification model by using the new updates weights of classified training samples and again predicts on the training set. Let’s have a look at how we can implement this algorithm:

from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import AdaBoostClassifier

ada_clf = AdaBoostClassifier(
    DecisionTreeClassifier(max_depth=1), n_estimators=200,
    algorithm="SAMME.R", learning_rate=0.5, random_state=42)
ada_clf.fit(X_train, y_train)Code language: Python (python)

To move further, I will create a helper function to plot a decision boundary. If you want to learn about a decision boundary, you can learn about it from here. Now, Let’s create the function:

from matplotlib.colors import ListedColormap

def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.45, -1, 1.5], alpha=0.5, contour=True):
    x1s = np.linspace(axes[0], axes[1], 100)
    x2s = np.linspace(axes[2], axes[3], 100)
    x1, x2 = np.meshgrid(x1s, x2s)
    X_new = np.c_[x1.ravel(), x2.ravel()]
    y_pred = clf.predict(X_new).reshape(x1.shape)
    custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])
    plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
    if contour:
        custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])
        plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", alpha=alpha)
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", alpha=alpha)
    plt.axis(axes)
    plt.xlabel(r"$x_1$", fontsize=18)
    plt.ylabel(r"$x_2$", fontsize=18, rotation=0)
plot_decision_boundary(ada_clf, X, y)Code language: Python (python)
decision boundary

Now, let’s plot the results of our classification model using a decision boundary:

from sklearn.svm import SVC
m = len(X_train)

fix, axes = plt.subplots(ncols=2, figsize=(10,4), sharey=True)
for subplot, learning_rate in ((0, 1), (1, 0.5)):
    sample_weights = np.ones(m)
    plt.sca(axes[subplot])
    for i in range(5):
        svm_clf = SVC(kernel="rbf", C=0.05, gamma="scale", random_state=42)
        svm_clf.fit(X_train, y_train, sample_weight=sample_weights)
        y_pred = svm_clf.predict(X_train)
        sample_weights[y_pred != y_train] *= (1 + learning_rate)
        plot_decision_boundary(svm_clf, X, y, alpha=0.2)
        plt.title("learning_rate = {}".format(learning_rate), fontsize=16)
    if subplot == 0:
        plt.text(-0.7, -0.65, "1", fontsize=14)
        plt.text(-0.6, -0.10, "2", fontsize=14)
        plt.text(-0.5,  0.10, "3", fontsize=14)
        plt.text(-0.4,  0.55, "4", fontsize=14)
        plt.text(-0.3,  0.90, "5", fontsize=14)
    else:
        plt.ylabel("")
plt.show()Code language: Python (python)
adaboost

The new classification algorithm does a better job in the same instances as used by the decision tree algorithm. The figure on the right represents the sequence of predictions used by the AdaBoost Algorithm. The learning rate is halved because the less classified weights are boosted half at every step of iterations.

Also, Read – WhatsApp Group Chat Analysis.

As you can see, the AdaBoost Algorithm added the predictions to the model and at the end made it performed better. If your model is overfitting on the training set, you can reduce the number of estimators. I hope you liked this article. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Also, Read – Machine Learning Algorithms that are Mostly Used.

Follow Us:

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1536

Leave a Reply