In this article, we will see how to evaluate binary classification algorithms using precision-recall curves. Specifically, we will cover the following topics:

· **Confusion Matrix, Precision, and Recall**

· **Precision-Recall Curves**

· **Evaluating Models using Precision-Recall Curves**

· **Conclusion**

Before understanding precision-recall curves, let’s review the confusion matrix, precision, and recall.

## Confusion Matrix, Precision, and Recall

### Confusion Matrix

A confusion matrix displays the performance of a classification model. Usually, it looks like the following for a binary classifier:

Let’s explain each of the terms in the above table:

· **True Positives:** It shows the number of samples where the classifier predicted the positive class, and they were all actually positive.

· **True Negatives:** The total number of cases for which the model predicted the negative class correctly.

· **False Positives**: It includes the samples’ count where the classifier predicted the data points as positive, but they belonged to the negative class.

· **False Negatives:** It represents the total number of samples for which the model classified the data points as negative, but their ground truth was positive.

Let’s now see what precision is.

### Precision

Precision can be defined as the ratio of true positives over all the positive predictions by the model. It can be calculated as follows:

Let’s consider a model that predicts whether the person has any cardiovascular disease (positive class) or not (negative class). Precision, in this problem, can be defined as out of all the cases where the model predicted that the person has cardiovascular disease, how many of them were actually true.

The value of precision lies between 0 and 1. A high value represents a smaller number of false positives, i.e., few persons were misdiagnosed with cardiovascular disease by the model.

### Recall

Recall can be defined as the ratio of true positive predictions over the actual positives. It can be calculated using the following formula:

If we consider the above same model, the recall will tell us out of all the persons that have heart disease, how many of them got diagnosed by the model.

The value of recall also ranges between 0 and 1. A high value represents a smaller number of false negatives, i.e., few people did not get diagnosed by the model but had cardiovascular disease.

Now that we know what precision and recall are, let’s discuss precision-recall curves.

### Precision-Recall Curves

The precision-recall curve plots precision against recall having precision on the y-axis and recall on the x-axis. It finds the relationship between two metrics for different probability thresholds.

The precision-recall curve for the baseline model will be a horizontal line, i.e., the model that predicts the positive class every single time.

Consider the following precision-recall curve for the ideal model, i.e., it gives perfect scores for both precision and recall at all thresholds.

Practically, it will lie between the two curves. Moreover, a good model (high precision and recall) will extend towards the (1, 1) coordinate.

## Evaluating Models using Precision-Recall Curves

First, let’s import the necessary modules. Moreover, for this article, we will use the Bank Loan Classification dataset from Kaggle to predict whether a person will accept the loan offer or not.

Note: The notebook for this tutorial can be found on my GitHub.

Let’s drop the unnecessary attributes and split the dataset into training and testing samples. We set aside 20% of the data for testing.

As you can see, there is a difference in the ranges of attributes. To solve this issue, we standardize the data using **StandardScalar**class imported from **sklearn.preprocessing**.

Let’s now go ahead and train different models using the training data. Moreover, we also store the prediction probabilities for the positive class as they will be used later for calculating precision and recall. The models trained are logistic regression, K-nearest neighbors, support vector machine, decision tree, and random forests.

Now that we have trained models, let’s calculate their precision and recall values for different thresholds using the **precision_recall_curve()**method of the **sklearn.metrics** module. It takes true labels of samples and the predicted probabilities. It returns precision and recall values for different thresholds. We also calculate the area under the precision-recall curve to summarize the information, i.e., the greater the AUC score, the better the model.

Let’s now visualize the precision-recall curves using the **matplotlib** library.

As you can see in the above output, all the models perform well and are way above the baseline. The best model here is the random forest classifier with an AUC score of 0.99.

## Conclusion

The precision-recall curve is one of the methods to evaluate your model. It finds the trade-off between precision and recall. It is usually preferred when the classes are highly imbalanced. Depending upon the problem at hand, you would want to optimize for precision or recall or both. If we consider the Covid-19 detection problem, then recall is more important because we do not want to increase the number of false negatives, i.e., fail to detect a person with Covid.

However, while predicting criminals, precision becomes more important because the cost of misclassifying innocent persons as criminals is more, i.e., imagine an innocent person going to jail for 20 years. Therefore, once you identify the problem, you can evaluate multiple models or select the threshold of a model using precision-recall curves.