Classification Model Evaluation in Machine Learning

In Machine Learning, the concept of model evaluation means determining the performance of a Machine Learning model. To evaluate the performance of a Machine Learning model, we need to use model evaluation metrics. So, if you want to learn classification model evaluation, this article is for you. In this article, I will take you through the task of classification model evaluation in Machine Learning.

Classification Model Evaluation in Machine Learning

Just like there is a bunch of machine learning algorithms that we can use to train classification models, the same way there is a bunch of metrics we can use to evaluate classification models. Below are some of the model evaluation metrics that you can use for classification model evaluation:

  1. Confusion matrix
  2. Accuracy
  3. Classification report
  4. AUC and ROC

In the section below, you will learn how to use these metrics for classification model evaluation in machine learning using Python.

Classification Model Evaluation using Python

To evaluate the performance of a classification model, we first need to train a classification machine learning model. I recently shared an article on spam comments detection where I trained a Machine Learning model to classify spam and not spam comments. You can find that article here. I will use the same classification model here. So let’s train a machine learning model quickly to use the classification model evaluation metrics one by one:

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import BernoulliNB

data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/Youtube01-Psy.csv")
data = data[["CONTENT", "CLASS"]]
x = np.array(data["CONTENT"])
y = np.array(data["CLASS"])

cv = CountVectorizer()
x = cv.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, 
                                                test_size=0.2, 
                                                random_state=42)

model = BernoulliNB()
model.fit(xtrain, ytrain)
predictions = model.predict(xtest)

So far, we have trained a Machine Learning model for classifying spam and not spam comments. Now let’s use the classification model evaluation metrics to evaluate the performance of our model.

Confusion Matrix

The confusion matrix represents an array of true negatives, false positives, false negatives, and true positives. Below is how you can use the confusion matrix to evaluate our classification model:

# Confusion Matrix
import seaborn as sns
from sklearn.metrics import confusion_matrix
confusionMatrix = confusion_matrix(ytest, predictions)
print(confusionMatrix)
[[27  0]
 [ 1 42]]

In the above output array, 27 represents the number of times the machine learning model correctly classified the comment as not spam, and 0 represents the number of times the model failed in classifying comments correctly. And in the next row, 42 represents the number of times the model correctly classified the comment as spam, and 1 represents the number of times the model failed to classify comments accurately.

Accuracy

Accuracy is one of the most valuable model evaluation metrics. It means 1 – (Number of misclassified samples / Total number of samples). Below is how to calculate the accuracy of your classification model:

# Accuracy
print(model.score(xtest, ytest))
0.9857142857142858

Classification Report

A classification report displays precision, recall, F1 score, and support of a classification model. You can learn everything about classification report from here. Below is how you can create a classification report of a classification model:

# Classification Report
from sklearn.metrics import classification_report
print(classification_report(ytest, predictions))
              precision    recall  f1-score   support

           0       0.96      1.00      0.98        27
           1       1.00      0.98      0.99        43

    accuracy                           0.99        70
   macro avg       0.98      0.99      0.99        70
weighted avg       0.99      0.99      0.99        70

AUC and ROC

ROC stands for Receiver Operating Characteristic curve. It is a graph showing the performance of a classification model by plotting the true positive rate and false positive rate. AUC stands for Area under the curve. It measures the area under the ROC curve. The closer AUC is to 1, the better your classification model is. Below is how to visualize the AUC and ROC of your classification model:

# AUC and ROC
import matplotlib.pyplot as plt
from sklearn import metrics
auc = metrics.roc_auc_score(ytest, predictions)

false_positive_rate, true_positive_rate, thresolds = metrics.roc_curve(ytest, predictions)

plt.figure(figsize=(10, 8), dpi=100)
plt.axis('scaled')
plt.xlim([0, 1])
plt.ylim([0, 1])
plt.title("AUC & ROC Curve")
plt.plot(false_positive_rate, true_positive_rate, 'g')
plt.fill_between(false_positive_rate, true_positive_rate, facecolor='lightgreen', alpha=0.7)
plt.text(0.95, 0.05, 'AUC = %0.4f' % auc, ha='right', fontsize=12, weight='bold', color='blue')
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.show()
AUC & ROC Curve for classification model evaluation

Summary

So this is how you can use model evaluation metrics to evaluate the performance of your machine learning model. In Machine Learning, the concept of model evaluation means determining the performance of a Machine Learning model. I hope you liked this article on classification model evaluation in machine learning. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1537

Leave a Reply