In Machine Learning, Precision and Recall are the two most important metrics for Model Evaluation. Precision represents the percentage of the results of your model, which are relevant to your model. The recall represents the percentage total of total pertinent results classified correctly by your machine learning algorithm.
In this article, I will show you how you can apply Precision and Recall to evaluate the performance of your Machine Learning model.
Applying Precision and Recall in Machine Learning
I will apply Precision and Recall using my earlier post on Binary Classification. I will continue this task from where I ended in Binary Classification.
Scikit-Learn provides several functions to compute classifier metrics:
from sklearn.metrics import precision_score, recall_score precision_score(y_train_5, y_train_pred)Code language: Python (python)
4096 / (4096 + 1522)Code language: Python (python)
recall_score(y_train_5, y_train_pred)Code language: Python (python)
F1 Score in Precision and Recall
It is often convenient to combine these two metrics into a single parameter called the F1 score, in particular, if you need a simple way to compare two classifiers. The F1 score is the harmonic mean of precision and recall.
Whereas the regular mean treats all values equally, the harmonic mean gives much more weight to low values. As a result, the classifier will only get a high F1 score if both the metrics are high.
To compute the F1 score, simply call the f1_score() function:
from sklearn.metrics import f1_score f1_score(y_train_5, y_train_pred)Code language: Python (python)
4096 / (4096 + (1522 + 1325) / 2)Code language: Python (python)
To understand this trade-off, let’s look at how the SGDClassifier makes its classification decisions. For each instance, it computes a score based on a decision function. If that score is higher than a threshold, it assigns the example to the positive class; otherwise, it assigns it to the negative category.
In the image above precision/recall trade-off, models are ranked by their classifier score, and those above the chosen decision threshold are considered positive; the higher the limit, the lower the recall, but (in general) the higher the precision.
Scikit-Learn does not let you set the threshold directly, but it does give you access to the decision scores that it uses to make predictions. Instead of calling the classifier’s predict() method, you can call its decision_function() method, which returns a score for each instance, and then use any threshold you want to make predictions based on those scores:
y_scores = sgd_clf.decision_function([some_digit]) y_scoresCode language: Python (python)
threshold = 0 y_some_digit_pred = (y_scores > threshold) y_some_digit_predCode language: Python (python)
The SGDClassifier uses a threshold equal to 0, so the previous code returns the same result as the predict() method (i.e., True). Let’s raise the threshold:
threshold = 8000 y_some_digit_pred = (y_scores > threshold) y_some_digit_predCode language: Python (python)
How do you decide which threshold to use? First, use the cross_val_predict() function to get the scores of all instances in the training set, but this time specify that you want to return decision scores instead of predictions:
y_scores = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method="decision_function") from sklearn.metrics import precision_recall_curve precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores) def plot_precision_recall_vs_threshold(precisions, recalls, thresholds): plt.plot(thresholds, precisions[:-1], "b--", label="Precision", linewidth=2) plt.plot(thresholds, recalls[:-1], "g-", label="Recall", linewidth=2) plt.legend(loc="center right", fontsize=16) # Not shown in the book plt.xlabel("Threshold", fontsize=16) # Not shown plt.grid(True) # Not shown plt.axis([-50000, 50000, 0, 1]) # Not shown recall_90_precision = recalls[np.argmax(precisions >= 0.90)] threshold_90_precision = thresholds[np.argmax(precisions >= 0.90)] plt.figure(figsize=(8, 4)) # Not shown plot_precision_recall_vs_threshold(precisions, recalls, thresholds) plt.plot([threshold_90_precision, threshold_90_precision], [0., 0.9], "r:") # Not shown plt.plot([-50000, threshold_90_precision], [0.9, 0.9], "r:") # Not shown plt.plot([-50000, threshold_90_precision], [recall_90_precision, recall_90_precision], "r:")# Not shown plt.plot([threshold_90_precision], [0.9], "ro") # Not shown plt.plot([threshold_90_precision], [recall_90_precision], "ro") # Not shown save_fig("precision_recall_vs_threshold_plot") # Not shown plt.show()Code language: Python (python)
You may wonder why the blue curve is bumpier than the green curve in the output above. The reason is that precision may sometimes go
down when you raise the threshold (although in general, it will go
Another way to select a good trade-off is to plot these two metrics directly against the recall:
(y_train_pred == (y_scores > 0)).all() def plot_precision_vs_recall(precisions, recalls): plt.plot(recalls, precisions, "b-", linewidth=2) plt.xlabel("Recall", fontsize=16) plt.ylabel("Precision", fontsize=16) plt.axis([0, 1, 0, 1]) plt.grid(True) plt.figure(figsize=(8, 6)) plot_precision_vs_recall(precisions, recalls) plt.plot([0.4368, 0.4368], [0., 0.9], "r:") plt.plot([0.0, 0.4368], [0.9, 0.9], "r:") plt.plot([0.4368], [0.9], "ro") save_fig("precision_vs_recall_plot") plt.show()Code language: Python (python)
You can see that precision starts to fall sharply around 80% recall. You will probably want to select a precision/recall trade-off just before that drop. I hope you liked this article. Feel free to ask your valuable questions in the comments section below. Also, follow me on Medium to read some more amazing articles like this.