In Machine Learning, the concept of model evaluation means determining the performance of a Machine Learning model. To evaluate the performance of a Machine Learning model, we need to use model evaluation metrics. So, if you want to learn classification model evaluation, this article is for you. In this article, I will take you through the task of classification model evaluation in Machine Learning.
Classification Model Evaluation in Machine Learning
Just like there is a bunch of machine learning algorithms that we can use to train classification models, the same way there is a bunch of metrics we can use to evaluate classification models. Below are some of the model evaluation metrics that you can use for classification model evaluation:
- Confusion matrix
- Accuracy
- Classification report
- AUC and ROC
In the section below, you will learn how to use these metrics for classification model evaluation in machine learning using Python.
Classification Model Evaluation using Python
To evaluate the performance of a classification model, we first need to train a classification machine learning model. I recently shared an article on spam comments detection where I trained a Machine Learning model to classify spam and not spam comments. You can find that article here. I will use the same classification model here. So let’s train a machine learning model quickly to use the classification model evaluation metrics one by one:
import pandas as pd import numpy as np from sklearn.feature_extraction.text import CountVectorizer from sklearn.model_selection import train_test_split from sklearn.naive_bayes import BernoulliNB data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-data/master/Youtube01-Psy.csv") data = data[["CONTENT", "CLASS"]] x = np.array(data["CONTENT"]) y = np.array(data["CLASS"]) cv = CountVectorizer() x = cv.fit_transform(x) xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42) model = BernoulliNB() model.fit(xtrain, ytrain) predictions = model.predict(xtest)
So far, we have trained a Machine Learning model for classifying spam and not spam comments. Now let’s use the classification model evaluation metrics to evaluate the performance of our model.
Confusion Matrix
The confusion matrix represents an array of true negatives, false positives, false negatives, and true positives. Below is how you can use the confusion matrix to evaluate our classification model:
# Confusion Matrix import seaborn as sns from sklearn.metrics import confusion_matrix confusionMatrix = confusion_matrix(ytest, predictions) print(confusionMatrix)
[[27 0] [ 1 42]]
In the above output array, 27 represents the number of times the machine learning model correctly classified the comment as not spam, and 0 represents the number of times the model failed in classifying comments correctly. And in the next row, 42 represents the number of times the model correctly classified the comment as spam, and 1 represents the number of times the model failed to classify comments accurately.
Accuracy
Accuracy is one of the most valuable model evaluation metrics. It means 1 – (Number of misclassified samples / Total number of samples). Below is how to calculate the accuracy of your classification model:
# Accuracy print(model.score(xtest, ytest))
0.9857142857142858
Classification Report
A classification report displays precision, recall, F1 score, and support of a classification model. You can learn everything about classification report from here. Below is how you can create a classification report of a classification model:
# Classification Report from sklearn.metrics import classification_report print(classification_report(ytest, predictions))
precision recall f1-score support 0 0.96 1.00 0.98 27 1 1.00 0.98 0.99 43 accuracy 0.99 70 macro avg 0.98 0.99 0.99 70 weighted avg 0.99 0.99 0.99 70
AUC and ROC
ROC stands for Receiver Operating Characteristic curve. It is a graph showing the performance of a classification model by plotting the true positive rate and false positive rate. AUC stands for Area under the curve. It measures the area under the ROC curve. The closer AUC is to 1, the better your classification model is. Below is how to visualize the AUC and ROC of your classification model:
# AUC and ROC import matplotlib.pyplot as plt from sklearn import metrics auc = metrics.roc_auc_score(ytest, predictions) false_positive_rate, true_positive_rate, thresolds = metrics.roc_curve(ytest, predictions) plt.figure(figsize=(10, 8), dpi=100) plt.axis('scaled') plt.xlim([0, 1]) plt.ylim([0, 1]) plt.title("AUC & ROC Curve") plt.plot(false_positive_rate, true_positive_rate, 'g') plt.fill_between(false_positive_rate, true_positive_rate, facecolor='lightgreen', alpha=0.7) plt.text(0.95, 0.05, 'AUC = %0.4f' % auc, ha='right', fontsize=12, weight='bold', color='blue') plt.xlabel("False Positive Rate") plt.ylabel("True Positive Rate") plt.show()

Summary
So this is how you can use model evaluation metrics to evaluate the performance of your machine learning model. In Machine Learning, the concept of model evaluation means determining the performance of a Machine Learning model. I hope you liked this article on classification model evaluation in machine learning. Feel free to ask valuable questions in the comments section below.