Fraud is one of the major issues we come up majorly in banks, life insurance, health insurance, and many others. These major frauds are dependent on the person who is trying to sell you the fake product or service, if you are matured enough to decide what is wrong then you will never get into any fraud transactions. But one such fraud that has been increasing a lot these days is fraud in making payments. In this article, I will take you through a solution to fraud detection with machine learning.
The dataset that I will use for this task can be easily downloaded from here. The dataset that I am using is transaction data for online purchases collected from an e-commerce retailer. The dataset contains more than 39000 transactions, each transaction contains 5 features that will describe the nature of the transactions. So let’s start with importing all the necessary libraries we need for Fraud Detection with Machine Learning:
import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrixCode language: Python (python)
Payment Fraud Detection Model
Fortunately, the dataset I am using is already structured very well with no missing values in it, and I don’t find any scope of data cleaning in it. So without wasting any time, I will dive into building our machine learning model. So, now I will start by importing the data:
from google.colab import files uploaded = files.upload() df = pd.read_csv('payment_fraud.csv') df.head()Code language: Python (python)
Now, I will split the data into training and test sets:
# Split dataset up into train and test sets X_train, X_test, y_train, y_test = train_test_split( df.drop('label', axis=1), df['label'], test_size=0.33, random_state=17)Code language: Python (python)
As this is a problem of binary classification, I will use a Logistic Regression algorithm, as it is one of the most powerful algorithms for a binary classification model. If you don’t know what Binary Classification means, you can learn it from here. Now let’s simply train the fraud detection model using logistic regression algorithm and have a look at the accuracy score that we will get by using this algorithm:
clf = LogisticRegression().fit(X_train, y_train) # Make predictions on test set y_pred = clf.predict(X_test) from sklearn.metrics import accuracy_score print(accuracy_score(y_pred, y_test))Code language: Python (python)
Well, what was the last time when you got an accuracy of 100 per cent. Our fraud detection model gave an accuracy of 100 per cent by using the logistic regression algorithm.
Evaluating the Fraud Detection Model
Now, let’s evaluate the performance of our model. I will use the confusion matrix algorithm to evaluate the performance of our model. We can use the confusion matrix algorithm with a one-line code only:
# Compare test set predictions with ground truth labels print(confusion_matrix(y_test, y_pred))Code language: Python (python)
[ 0 190]]
So out of all the transaction in the dataset,190 transactions are correctly recognized as fraud, and 12753 transactions are recognized as not fraudulent transactions. I hope you liked this article on Fraud Detection with Machine Learning. Feel free to ask your valuable questions in the comments section below.