The Random Forest algorithm is an ensemble of the Decision Trees algorithm. A Decision Tree model is generally trained using the Bagging Classifier. If you don’t want to use a bagging classifier algorithm to pass it through the Decision Tree Classification model, you can use a Random Forest algorithm as it is more convenient and better optimized for Decision Tree Classification. In this article, I will take you through the Random Forest algorithm in Machine Learning.
I will use all the CPU cores to train a RandomForestClassifier algorithm with 500 trees. But first, let’s start with importing the necessary libraries and data preparation to fit into a RandomForestClassifier algorithm:
import sys
assert sys.version_info >= (3, 5)
# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"
# Common imports
import numpy as np
import os
# to make this notebook's output stable across runs
np.random.seed(42)
# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)
Code language: Python (python)
Data Preparation
Now, I will load the data, and split it into training and test sets:
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
Code language: Python (python)
Now I will prepare the data using the bagging classifier and the decision tree classification model:
from sklearn.ensemble import BaggingClassifier
from sklearn.tree import DecisionTreeClassifier
bag_clf = BaggingClassifier(
DecisionTreeClassifier(splitter="random", max_leaf_nodes=16, random_state=42),
n_estimators=500, max_samples=1.0, bootstrap=True, random_state=42)
bag_clf.fit(X_train, y_train)
y_pred = bag_clf.predict(X_test)
Code language: Python (python)
Random Forest Algorithm in Machine Learning
With a very few parameters, a RandomForestClassifier uses all the hyperparameters of a Decision Tree Classification model and bagging classifier algorithm. Now let’s see how we can do this:
from sklearn.ensemble import RandomForestClassifier
rnd_clf = RandomForestClassifier(n_estimators=500, max_leaf_nodes=16, random_state=42)
rnd_clf.fit(X_train, y_train)
y_pred_rf = rnd_clf.predict(X_test)
np.sum(y_pred == y_pred_rf) / len(y_pred)
Code language: Python (python)
0.976
The RandomForestClassifier algorithm works by introducing extra randomness while producing decision trees. Instead of searching for the best features, it works by searching for the best features among the random sets of features.
Also, Read – Work on Artificial Intelligence Projects.
from sklearn.datasets import load_iris
iris = load_iris()
rnd_clf = RandomForestClassifier(n_estimators=500, random_state=42)
rnd_clf.fit(iris["data"], iris["target"])
for name, score in zip(iris["feature_names"], rnd_clf.feature_importances_):
print(name, score)
Code language: Python (python)
sepal length (cm) 0.11249225099876375 sepal width (cm) 0.02311928828251033 petal length (cm) 0.4410304643639577 petal width (cm) 0.4233579963547682
rnd_clf.feature_importances_
Code language: Python (python)
array([0.11249225, 0.02311929, 0.44103046, 0.423358 ])
Decision Boundary for Random Forest
Now I will plot a decision boundary on our model. For this purpose, I will first create a function to plot the decision boundary. If you don’t know what a decision boundary is, you can learn it from here. Now let’s create the function:
from matplotlib.colors import ListedColormap
def plot_decision_boundary(clf, X, y, axes=[-1.5, 2.45, -1, 1.5], alpha=0.5, contour=True):
x1s = np.linspace(axes[0], axes[1], 100)
x2s = np.linspace(axes[2], axes[3], 100)
x1, x2 = np.meshgrid(x1s, x2s)
X_new = np.c_[x1.ravel(), x2.ravel()]
y_pred = clf.predict(X_new).reshape(x1.shape)
custom_cmap = ListedColormap(['#fafab0','#9898ff','#a0faa0'])
plt.contourf(x1, x2, y_pred, alpha=0.3, cmap=custom_cmap)
if contour:
custom_cmap2 = ListedColormap(['#7d7d58','#4c4c7f','#507d50'])
plt.contour(x1, x2, y_pred, cmap=custom_cmap2, alpha=0.8)
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "yo", alpha=alpha)
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "bs", alpha=alpha)
plt.axis(axes)
plt.xlabel(r"$x_1$", fontsize=18)
plt.ylabel(r"$x_2$", fontsize=18, rotation=0)
Code language: Python (python)
plt.figure(figsize=(6, 4))
for i in range(15):
tree_clf = DecisionTreeClassifier(max_leaf_nodes=16, random_state=42 + i)
indices_with_replacement = np.random.randint(0, len(X_train), len(X_train))
tree_clf.fit(X[indices_with_replacement], y[indices_with_replacement])
plot_decision_boundary(tree_clf, X, y, axes=[-1.5, 2.45, -1, 1.5], alpha=0.02, contour=False)
plt.show()
Code language: Python (python)

The Random Forest algorithm results in higher tree diversification, which trades with a very higher level of bias for lower variance. Overall, it leads to a better prediction model. I hope you liked this article, feel free to ask your valuable questions in the comments section below.
One of the best blogs that i have read still now. Thanks for your contribution in sharing such a useful information. Waiting for your further updates.
Thanks Ajay, Keep Visiting us
Thanks nice information
thanks, keep visiting us