Breast Cancer Detection with Machine Learning

Over the past decades, machine learning techniques have been widely used in intelligent health systems, particularly for breast cancer diagnosis and prognosis. In this article, I will walk you through how to create a breast cancer detection model using machine learning and the Python programming language.

Breast cancer is one of the most common cancers in women globally, accounting for the majority of new cancer cases and cancer-related deaths according to global statistics, making it a major public health problem in the world. today’s society.

Also, Read – Machine Learning Full Course for free.

Early diagnosis of breast cancer can dramatically improve prognosis and chances of survival, as it can promote timely clinical treatment of patients. More precise classification of benign tumours can prevent patients from undergoing unnecessary treatments.

Breast Cancer Detection using Machine Learning

In this section, I will implement a Naive Bayes algorithm in Machine Learning using Python. For this task, I will use a database of breast cancer tumour information for breast cancer detection.

Let’s start by importing and loading the necessary python libraries and the breast cancer dataset provided by Scikit-learn:

Now we need to create new variables for each important set of information that we find useful and assign the attributes in the dataset to those variables:

We now have values for each set of useful information in the dataset. To better understand our dataset, let’s take a look at our data by printing our class labels, the label for the first data instance, our entity names, and the entity values for the first data instance:

['malignant' 'benign']
Class label : 0
['mean radius' 'mean texture' 'mean perimeter' 'mean area'
 'mean smoothness' 'mean compactness' 'mean concavity'
 'mean concave points' 'mean symmetry' 'mean fractal dimension'
 'radius error' 'texture error' 'perimeter error' 'area error'
 'smoothness error' 'compactness error' 'concavity error'
 'concave points error' 'symmetry error' 'fractal dimension error'
 'worst radius' 'worst texture' 'worst perimeter' 'worst area'
 'worst smoothness' 'worst compactness' 'worst concavity'
 'worst concave points' 'worst symmetry' 'worst fractal dimension']
[1.799e+01 1.038e+01 1.228e+02 1.001e+03 1.184e-01 2.776e-01 3.001e-01
 1.471e-01 2.419e-01 7.871e-02 1.095e+00 9.053e-01 8.589e+00 1.534e+02
 6.399e-03 4.904e-02 5.373e-02 1.587e-02 3.003e-02 6.193e-03 2.538e+01
 1.733e+01 1.846e+02 2.019e+03 1.622e-01 6.656e-01 7.119e-01 2.654e-01
 4.601e-01 1.189e-01] 

Now that our data is loaded, we can work with our data to build our machine learning model using the Naive Bayes algorithm for the breast cancer detection task.

Splitting The Dataset

To evaluate the performance of a classifier, you should always test the model on invisible data. Therefore, before I create a machine learning model for breast cancer detection, I will divide your data into two parts: an 80% training set and a 20% test set:

train, test, train_labels, test_labels = train_test_split(features, labels,
                                                          test_size=0.2,
                                                          random_state=42)

Using Naive Bayes for Breast Cancer Detection

There are many models of machine learning, and each model has its strengths and weaknesses. For the Breast Cancer Detection Model task, I will focus on a simple algorithm that generally works well in binary classification tasks, namely the Naive Bayes classifier:

gnb = GaussianNB()
gnb.fit(train, train_labels)

After training the model, we can then use the trained model to make predictions on our test set, which we use the predict() function.

The predict() function returns an array of predictions for each data instance in the test set. We can then print out our predictions to get a feel for what the model determined:

preds = gnb.predict(test)
print(preds, "\n")
[1 0 0 1 1 0 0 0 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 0
 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 1 0 0 1 1 1 0 0 1 1 0 0 1 0
 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 1 0 0 1 1 1 0 1 1 0
 1 1 0] 

Using the array of true class labels, we can assess the accuracy of our model’s predictors by comparing the two arrays (test_labels vs preds).

I’ll use the accuracy_score () function provided by Scikit-Learn to determine the accuracy rate of our machine learning classifier:

print(accuracy_score(test_labels, preds))
0.9736842105263158

As you can see from the output above, our breast cancer detection model gives an accuracy rate of almost 97%. This means that 97% of the time the classifier is able to make the correct prediction.

So this is how we can build a Breast cancer detection model using Machine Learning and the Python programming language. I hope you liked this article on how to build a breast cancer detection model with Machine Learning. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

2 Comments

Leave a Reply