Binary Classification Model

Binary Classification is a type of classification model that have two label of classes. For example an email spam detection model contains two label of classes as spam or not spam. Most of the times the tasks of binary classification includes one label in a normal state, and another label in an abnormal state. In this article I will take you through Binary Classification in Machine Learning using Python.

MNIST Dataset

I will be using the MNIST dataset, which is a set of 70,000 small images of digits handwritten by high school students and employees of the US Census Bureau. Each image is labeled with the digit it represents. This set has been studied so much that it is often called the “hello world” of Machine Learning.

Whenever people come up with new classification algorithm they are curious to see how it will perform on MNIST, and anyone who learns Machine Learning tackles this dataset sooner or later. So let’s import some libraries to start with our Binary Classification model:

# Python ≥3.5 is required import sys assert sys.version_info >= (3, 5) # Scikit-Learn ≥0.20 is required import sklearn assert sklearn.__version__ >= "0.20" # Common imports import numpy as np import os # to make this notebook's output stable across runs np.random.seed(42) # To plot pretty figures %matplotlib inline import matplotlib as mpl import matplotlib.pyplot as plt mpl.rc('axes', labelsize=14) mpl.rc('xtick', labelsize=12) mpl.rc('ytick', labelsize=12)

Scikit-Learn provides many helper functions to download popular datasets. MNIST is one of them. The following code fetches the MNIST dataset:

from sklearn.datasets import fetch_openml mnist = fetch_openml('mnist_784', version=1) mnist.keys()
dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'DESCR', 'details', 'categories', 'url'])

Now Let’s look at the data:

X, y = mnist["data"], mnist["target"] X.shape

(70000, 784)



28 * 28


There are 70,000 images, and each image has 784 features. This is because each image is 28×28 pixels, and each feature simply represents one pixel’s intensity, from 0 (white) to 255(black). Let’s take a peak at one digit from the dataset. All you need to do is grab an instance’s feature vector, reshape it to a 28×28 array, and display it using Matplotlib’s imshow() function:

%matplotlib inline import matplotlib as mpl import matplotlib.pyplot as plt some_digit = X[0] some_digit_image = some_digit.reshape(28, 28) plt.imshow(some_digit_image, plt.axis("off") save_fig("some_digit_plot")
Binary Classification

This looks like a 5, and indeed that’s what the label tells us:



Note that the label is a string. Most Machine Learning Algorithms expect numbers, so let’s cast y to integer:

y = y.astype(np.uint8)

Now before training a Binary Classification model, let’ have a look at the digits:

def plot_digit(data): image = data.reshape(28, 28) plt.imshow(image, cmap =, interpolation="nearest") plt.axis("off") def plot_digits(instances, images_per_row=10, **options): size = 28 images_per_row = min(len(instances), images_per_row) images = [instance.reshape(size,size) for instance in instances] n_rows = (len(instances) - 1) // images_per_row + 1 row_images = [] n_empty = n_rows * images_per_row - len(instances) images.append(np.zeros((size, size * n_empty))) for row in range(n_rows): rimages = images[row * images_per_row : (row + 1) * images_per_row] row_images.append(np.concatenate(rimages, axis=1)) image = np.concatenate(row_images, axis=0) plt.imshow(image, cmap =, **options) plt.axis("off") plt.figure(figsize=(9,9)) example_images = X[:100] plot_digits(example_images, images_per_row=10) save_fig("more_digits_plot")

You should always create a test set and set it aside before inspecting the data closely. The MNIST dataset is actually already split into a training set and a test set:

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

Training a Binary Classification Model

Let’s simply the problem for now and only try to identify one digit. For example, the number 5. This “5 detector” will be an example of a binary classification, capable of distinguishing between just two classes, 5 and not 5. Let’s create the target vectors for the classification task:

y_train_5 = (y_train == 5) y_test_5 = (y_test == 5)

Now let’s pick a classification model and train it. A good place to start is with a Stochastic Gradient Descent (SGD) deals with training instances independently, one at a time, as we will se later in our binary classification model. Let’s build a binary classification using the SGDClassifier and train it on the whole training set:

from sklearn.linear_model import SGDClassifier sgd_clf = SGDClassifier(max_iter=1000, tol=1e-3, random_state=42), y_train_5)
SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=1000, n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5, random_state=42, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False)

array([ True])

The classifier guesses that this image represents a 5 (True). Looks like it guessed right in this particular case. Now let’s evaluate the performance of our binary classification model.

Performance Measures

Evaluating a Classifier is often trickier than evaluating a regressor, so we will spend some more part of this article to evaluate our binary classification model.

Implementing Cross-Validation on Binary Classification Model

Occasionally you will need more control over the cross-validation process than what scikit-learn provides off the shelf. In these cases, you can implement cross-validation yourself. The following code does roughly the same thing as Scikit-learn’s cross_val_score() function does, and it prints the same result:

from sklearn.model_selection import StratifiedKFold from sklearn.base import clone skfolds = StratifiedKFold(n_splits=3, random_state=42) for train_index, test_index in skfolds.split(X_train, y_train_5): clone_clf = clone(sgd_clf) X_train_folds = X_train[train_index] y_train_folds = y_train_5[train_index] X_test_fold = X_train[test_index] y_test_fold = y_train_5[test_index], y_train_folds) y_pred = clone_clf.predict(X_test_fold) n_correct = sum(y_pred == y_test_fold) print(n_correct / len(y_pred))


The StratifiedKFold class performs stratified sampling to produce folds that contain a representative ratio of each class. At each iteration the code creates a clone of the classification model, trains that clone on the training folds, and make predictions on the test fold. Then it counts the number of correct predictions and outputs the ratio of correct predictions.

Let’s use the cross_val_score() function to evaluate our SGDClassifier model, using K-fold cross-validation with three folds. Remember that K-fold cross-validation means splitting the training set into K folds, then making predictions and evaluating them on each fold using a model trained on the remaining folds:

from sklearn.model_selection import cross_val_score cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")

array([0.95035, 0.96035, 0.9604 ])

Wow! Above 93% accuracy on all cross-validation folds. Well, before you get too exited, let’s look at a very dumb classifier that just classifies every single image in the “not 5” class:

from sklearn.base import BaseEstimator class Never5Classifier(BaseEstimator): def fit(self, X, y=None): pass def predict(self, X): return np.zeros((len(X), 1), dtype=bool) never_5_clf = Never5Classifier() cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")

array([0.91125, 0.90855, 0.90915])

Also, Read: Generate WordClouds with Python.

That’s right it has over 90% accuracy. this is simply because only about 10% of the images are 5s, so if you always guess that an image is not a 5, you will be right about 90% of the time. So I hope you liked this article on Binary Classification Model in Machine Learning. Feel free to ask you valuable questions in the comments section below.

Receive Daily Newsletters

Leave a Reply