Random Forest Algorithm

The Random Forest algorithm is an ensemble of the Decision Trees algorithm. A Decision Tree model is generally trained using the Bagging Classifier. If you don’t want to use a bagging classifier algorithm to pass it through the Decision Tree Classification model, you can use a Random Forest algorithm as it is more convenient and better optimized for Decision Tree Classification. In this article, I will take you through the Random Forest algorithm in Machine Learning.

I will use all the CPU cores to train a RandomForestClassifier algorithm with 500 trees. But first, let’s start with importing the necessary libraries and data preparation to fit into a RandomForestClassifier algorithm:

import sys assert sys.version_info >= (3, 5) # Scikit-Learn ≥0.20 is required import sklearn assert sklearn.__version__ >= "0.20" # Common imports import numpy as np import os # to make this notebook's output stable across runs np.random.seed(42) # To plot pretty figures %matplotlib inline import matplotlib as mpl import matplotlib.pyplot as plt mpl.rc('axes', labelsize=14) mpl.rc('xtick', labelsize=12) mpl.rc('ytick', labelsize=12)
Code language: Python (python)

Data Preparation

Now, I will load the data, and split it into training and test sets:

from sklearn.model_selection import train_test_split from sklearn.datasets import make_moons X, y = make_moons(n_samples=500, noise=0.30, random_state=42) X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
Code language: Python (python)

Now I will prepare the data using the bagging classifier and the decision tree classification model:

from sklearn.ensemble import BaggingClassifier from sklearn.tree import DecisionTreeClassifier bag_clf = BaggingClassifier( DecisionTreeClassifier(splitter="random", max_leaf_nodes=16, random_state=42), n_estimators=500, max_samples=1.0, bootstrap=True, random_state=42) bag_clf.fit(X_train, y_train) y_pred = bag_clf.predict(X_test)
Code language: Python (python)

Random Forest Algorithm in Machine Learning

With a very few parameters, a RandomForestClassifier uses all the hyperparameters of a Decision Tree Classification model and bagging classifier algorithm. Now let’s see how we can do this:

from sklearn.ensemble import RandomForestClassifier rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, random_state=42) rnd_clf.fit(X_train, y_train) y_pred_rf = rnd_clf.predict(X_test) np.sum(y_pred == y_pred_rf) / len(y_pred)
Code language: Python (python)


The RandomForestClassifier algorithm works by introducing extra randomness while producing decision trees. Instead of searching for the best features, it works by searching for the best features among the random sets of features.

Also, Read – Work on Artificial Intelligence Projects.

from sklearn.datasets import load_iris iris = load_iris() rnd_clf = RandomForestClassifier(n_estimators=500, random_state=42) rnd_clf.fit(iris["data"], iris["target"]) for name, score in zip(iris["feature_names"], rnd_clf.feature_importances_): print(name, score)
Code language: Python (python)
sepal length (cm) 0.11249225099876375
sepal width (cm) 0.02311928828251033
petal length (cm) 0.4410304643639577
petal width (cm) 0.4233579963547682
Code language: Python (python)

array([0.11249225, 0.02311929, 0.44103046, 0.423358 ])

Decision Boundary for Random Forest

Now I will plot a decision boundary on our model. For this purpose, I will first create a function to plot the decision boundary. If you don’t know what a decision boundary is, you can learn it from here. Now let’s create the function:

from matplotlib.colors import ListedColormap def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.45, -1, 1.5], alpha=0.5, contour=True): x1s = np.linspace(axes[0], axes[1], 100) x2s = np.linspace(axes[2], axes[3], 100) x1, x2 = np.meshgrid(x1s, x2s) X_new = np.c_[x1.ravel(), x2.ravel()] y_pred = clf.predict(X_new).reshape(x1.shape) custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0']) plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap) if contour: custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50']) plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8) plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", alpha=alpha) plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", alpha=alpha) plt.axis(axes) plt.xlabel(r"$x_1$", fontsize=18) plt.ylabel(r"$x_2$", fontsize=18, rotation=0)
Code language: Python (python)
plt.figure(figsize=(6, 4)) for i in range(15): tree_clf = DecisionTreeClassifier(max_leaf_nodes=16, random_state=42 + i) indices_with_replacement = np.random.randint(0, len(X_train), len(X_train)) tree_clf.fit(X[indices_with_replacement], y[indices_with_replacement]) plot_decision_boundary(tree_clf, X, y, axes=[-1.5, 2.45, -1, 1.5], alpha=0.02, contour=False) plt.show()
Code language: Python (python)
random forest

The Random Forest algorithm results in higher tree diversification, which trades with a very higher level of bias for lower variance. Overall, it leads to a better prediction model. I hope you liked this article, feel free to ask your valuable questions in the comments section below.

Follow Us:

Aman Kharwal
Aman Kharwal

Coder with the ♥️ of a Writer || Data Scientist | Solopreneur | Founder

Articles: 1336


Leave a Reply