# Decision Boundary in Machine Learning

The general goal of a classification model is to find a decision boundary. The purpose of the decision boundaries is to identify those regions of the input class space that corresponds to each class. In this article, I will take you through the concept of decision boundary in machine learning.

To explain the concept of decision boundaries in machine learning, I will first create a Logistic Regression model. So now let’s import some libraries and get started with the task:

.wp-block-code {
border: 0;
}

.wp-block-code > div {
overflow: auto;
}

.hljs {
box-sizing: border-box;
}

.hljs.shcb-code-table {
display: table;
width: 100%;
}

.hljs.shcb-code-table > .shcb-loc {
color: inherit;
display: table-row;
width: 100%;
}

.hljs.shcb-code-table .shcb-loc > span {
display: table-cell;
}

.wp-block-code code.hljs:not(.shcb-wrap-lines) {
white-space: pre;
}

.wp-block-code code.hljs.shcb-wrap-lines {
white-space: pre-wrap;
}

.hljs.shcb-line-numbers {
border-spacing: 0;
counter-reset: line;
}

.hljs.shcb-line-numbers > .shcb-loc {
counter-increment: line;
}

.hljs.shcb-line-numbers .shcb-loc > span {
}

.hljs.shcb-line-numbers .shcb-loc::before {
border-right: 1px solid #ddd;
content: counter(line);
display: table-cell;
text-align: right;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
white-space: nowrap;
width: 1%;
}
# Python ≥3.5 is required
import sys
assert sys.version_info &amp;gt;= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ &amp;gt;= "0.20"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

## Decision Boundaries with Logistic Regression

I will use the iris dataset to fit a Linear Regression model. Iris is a very famous dataset among machine learning practitioners for classification tasks. It contains the sepal and petal length with width of 150 iris flowers of three different species; Iris setosa, Iris versicolor, and Iris Virginica.

Now I will try to build a classification model to detect the Iris virginica type based only on the width of a petal:

t = np.linspace(-10, 10, 100)
sig = 1 / (1 + np.exp(-t))
plt.figure(figsize=(9, 3))
plt.plot([-10, 10], [0, 0], "k-")
plt.plot([-10, 10], [0.5, 0.5], "k:")
plt.plot([-10, 10], [1, 1], "k:")
plt.plot([0, 0], [-1.1, 1.1], "k-")
plt.plot(t, sig, "b-", linewidth=2, label=r"$\sigma(t) = \frac{1}{1 + e^{-t}}$")
plt.xlabel("t")
plt.legend(loc="upper left", fontsize=20)
plt.axis([-10, 10, -0.1, 1.1])
plt.show()

Now let’s train our Logistic Regression model to frame a decision boundary:

from sklearn import datasets
X = iris["data"][:, 3:]  # petal width
y = (iris["target"] == 2).astype(np.int)  # 1 if Iris virginica, else 0
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression(solver="lbfgs", random_state=42)
log_reg.fit(X, y)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, l1_ratio=None, max_iter=100, multi_class='auto', n_jobs=None, penalty='l2', random_state=42, solver='lbfgs', tol=0.0001, verbose=0, warm_start=False)

## Decision Boundary using One Feature (Petal Width)

Now let’s have a quick look at our trained model’s estimated probabilities for flowers with the petal widths that vary from 0 cm to 3 cm:

X_new = np.linspace(0, 3, 1000).reshape(-1, 1)
y_proba = log_reg.predict_proba(X_new)
decision_boundary = X_new[y_proba[:, 1] &amp;gt;= 0.5][0]

plt.figure(figsize=(8, 3))
plt.plot(X[y==0], y[y==0], "bs")
plt.plot(X[y==1], y[y==1], "g^")
plt.plot([decision_boundary, decision_boundary], [-1, 2], "k:", linewidth=2)
plt.plot(X_new, y_proba[:, 1], "g-", linewidth=2, label="Iris virginica")
plt.plot(X_new, y_proba[:, 0], "b--", linewidth=2, label="Not Iris virginica")
plt.text(decision_boundary+0.02, 0.15, "Decision  boundary", fontsize=14, color="k", ha="center")
plt.arrow(decision_boundary, 0.08, -0.3, 0, head_width=0.05, head_length=0.1, fc='b', ec='b')
plt.arrow(decision_boundary, 0.92, 0.3, 0, head_width=0.05, head_length=0.1, fc='g', ec='g')
plt.xlabel("Petal width (cm)", fontsize=14)
plt.ylabel("Probability", fontsize=14)
plt.legend(loc="center left", fontsize=14)
plt.axis([0, 3, -0.02, 1.02])
plt.show()

The output above shows that the petal width of Iris virginica ranges from 1.4 cm to 2.5 cm, while the other flowers are having a small petal width which is ranging from 0.1 cm to 1.8 cm. If we will use the predict() method to predict the class of the flower, it will return the class that mostly falls in this category.

There is a decision boundary at around 1.6 cm where both the probabilities are 50 percent, which conveys that if the petal width is higher than 1.6 cm, then our classification model will predict that the input class is an Iris virginica, and otherwise the model will predict that it is not iris virginica.

## Decision Boundary using Two Features(Petal Width & Petal Length)

Now let’s plot a little bit complex decision boundary which will be based on two features petal width and petal length. Now we will train the model based on two features to predict whether the flower is Iris virginica:

from sklearn.linear_model import LogisticRegression

X = iris["data"][:, (2, 3)]  # petal length, petal width
y = (iris["target"] == 2).astype(np.int)

log_reg = LogisticRegression(solver="lbfgs", C=10**10, random_state=42)
log_reg.fit(X, y)

x0, x1 = np.meshgrid(
np.linspace(2.9, 7, 500).reshape(-1, 1),
np.linspace(0.8, 2.7, 200).reshape(-1, 1),
)
X_new = np.c_[x0.ravel(), x1.ravel()]

y_proba = log_reg.predict_proba(X_new)

plt.figure(figsize=(10, 4))
plt.plot(X[y==0, 0], X[y==0, 1], "bs")
plt.plot(X[y==1, 0], X[y==1, 1], "g^")

zz = y_proba[:, 1].reshape(x0.shape)
contour = plt.contour(x0, x1, zz, cmap=plt.cm.brg)

left_right = np.array([2.9, 7])
boundary = -(log_reg.coef_[0][0] * left_right + log_reg.intercept_[0]) / log_reg.coef_[0][1]

plt.clabel(contour, inline=1, fontsize=12)
plt.plot(left_right, boundary, "k--", linewidth=3)
plt.text(3.5, 1.5, "Not Iris virginica", fontsize=14, color="b", ha="center")
plt.text(6.5, 2.3, "Iris virginica", fontsize=14, color="g", ha="center")
plt.xlabel("Petal length", fontsize=14)
plt.ylabel("Petal width", fontsize=14)
plt.axis([2.9, 7, 0.8, 2.7])
plt.show()

In the output above the dashed line is representing the points where our Logistic Regression model predicts a probability of 50 percent, this line is the decision boundary for our classification model. One thing to note here is that it is a Linear decision boundary.

Also, Read: Anomaly Detection with Machine Learning.

Here each parallel line is representing the points where the output of our model shows a specific probability from 15 percent to 90 percent. All the flowers lying beyond the top right line have over 90 percent probability of being Iris Virginia.

I hope you liked this article on Decision Boundary in Machine Learning. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to read more amazing articles.