COVID 19 Detection

One week ago, Dr Cohen started collecting X-ray images of COVID 19 cases and publishing them in the following GitHub repo, for the work of COVID 19 detection.

Inside the repo you’ll find example of COVID-19 cases, as well as MERS, SARS, and ARDS.

In order to create the COVID 19 X-ray image dataset for this Article, I:

  1. Parsed the metadata.csv file found in Dr. Cohen’s repository.
  2. Selected all rows that are:
    • Positive for COVID-19 (i.e., ignoring MERS, SARS, and ARDS cases).
    • Posterioranterior (PA) view of the lungs. I used the PA view as, to my knowledge, that was the view used for my “healthy” cases, as discussed below; however, I’m sure that a medical professional will be able clarify and correct me if I am incorrect (which I very well may be, this is just an example).

In total, that left us with 25 X-ray images of positive COVID-19 cases(figure 2 left above).

The next step was to sample X-ray images of healthy patients.

To do so, I used Chest X-Ray Images (Pneumonia) dataset and sampled 25 X-ray images from healthy patients. There are a number of problems with Chest X-Ray dataset, namely noisy/incorrect labels, but it served as a good enough starting point for this proof of concept COVID-19 detector.

After gathering our dataset, we are left with 50 total images, equally split with 25 images of COVID-19 positive X-rays and 25 images of healthy patient X-rays.

Lets start with importing the libraries

import numpy as np # linear algebra import pandas as pd # data processing from tensorflow.keras.preprocessing.image import ImageDataGenerator from tensorflow.keras.applications import VGG16 from tensorflow.keras.layers import AveragePooling2D from tensorflow.keras.layers import Dropout from tensorflow.keras.layers import Flatten from tensorflow.keras.layers import Dense from tensorflow.keras.layers import Input from tensorflow.keras.models import Model from tensorflow.keras.optimizers import Adam from tensorflow.keras.utils import to_categorical from sklearn.preprocessing import LabelBinarizer from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report from sklearn.metrics import confusion_matrix from imutils import paths import matplotlib.pyplot as plt import numpy as np import cv2
# initialize the initial learning rate, number of epochs to train for, # and batch size INIT_LR = 1e-3 EPOCHS = 25 BS = 8 #The path to our input dataset of chest X-ray images. dataset_dir = # your path to data set plot_path = # your path to plot.png' model_path = # your path to covid19.model'

Load and preprocess our X-ray data:

To load our data, we grab all paths to images in the dataset_dir directory . Then, for each imagePath, we:

  • Extract the class label (either covid or normal) from the path
  • Load the image, and preprocess it by converting to RGB channel ordering, and resizing it to 224×224 pixels so that it is ready for our Convolutional Neural Network.
  • Update our data and labels lists respectively (Lines 58 and 59).

We then scale pixel intensities to the range [0, 1] and convert both our data and labels to NumPy array format.

# grab the list of images in our dataset directory, then initialize # the list of data (i.e., images) and class images print("[INFO] loading images...") imagePaths = list(paths.list_images(dataset_dir)) data = [] labels = [] # loop over the image paths for imagePath in imagePaths: # extract the class label from the filename label = imagePath.split(os.path.sep)[-2] # load the image, swap color channels, and resize it to be a fixed # 224x224 pixels while ignoring aspect ratio image = cv2.imread(imagePath) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = cv2.resize(image, (224, 224)) # update the data and labels lists, respectively data.append(image) labels.append(label) data1 = data.copy() labels1 = labels.copy() # convert the data and labels to NumPy arrays while scaling the pixel # intensities to the range [0, 255] data = np.array(data) / 255.0 labels = np.array(labels)

Next we will one-hot encode our labels and create our training/testing splits:

One-hot encoding of labels takes place meaning that our data will be in the following format: [[0. 1.] [0. 1.] [0. 1.] … [1. 0.] [1. 0.] [1. 0.]]

Each encoded label consists of a two element array with one of the elements being “hot” (i.e., 1) versus “not” (i.e., 0).

Then construct our data split, reserving 80% of the data for training and 20% for testing.

In order to ensure that our model generalizes, we perform data augmentation by setting the random image rotation setting to 15 degrees clockwise or counterclockwise. We will Initialize the data augmentation generator object.

# perform one-hot encoding on the labels lb = LabelBinarizer() labels = lb.fit_transform(labels) labels = to_categorical(labels) print(labels) # partition the data into training and testing splits using 80% of # the data for training and the remaining 20% for testing (trainX, testX, trainY, testY) = train_test_split(data, labels, test_size=0.20, stratify=labels, random_state=42) # initialize the training data augmentation object trainAug = ImageDataGenerator(rotation_range=15, fill_mode="nearest")
[[1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [1. 0.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]
 [0. 1.]]

We will initialize our VGGNet model and set it up for fine-tuning:

We will instantiate the VGG16 network with weights pre-trained on ImageNet, leaving off the FC layer head.

From there, we construct a new fully-connected layer head consisting of POOL => FC = SOFTMAX layers and append it on top of VGG16.

We then freeze the CONV weights of VGG16 such that only the FC layer head will be trained; this completes our fine-tuning setup.

# load the VGG16 network, ensuring the head FC layer sets are left # off baseModel = VGG16(weights="imagenet", include_top=False, input_tensor=Input(shape=(224, 224, 3))) # construct the head of the model that will be placed on top of the # the base model headModel = baseModel.output headModel = AveragePooling2D(pool_size=(4, 4))(headModel) headModel = Flatten(name="flatten")(headModel) headModel = Dense(64, activation="relu")(headModel) headModel = Dropout(0.5)(headModel) headModel = Dense(2, activation="softmax")(headModel) # place the head FC model on top of the base model (this will become # the actual model we will train) model = Model(inputs=baseModel.input, outputs=headModel) # loop over all layers in the base model and freeze them so they will # *not* be updated during the first training process for layer in baseModel.layers: layer.trainable = False
Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 1s 0us/step

Compile and train our COVID 19 Detection (coronavirus) deep learning model:

compile the network with learning rate decay and the Adam optimizer. Given that this is a 2-class problem, we use “binary_crossentropy” loss rather than categorical crossentropy.

To kick off our COVID-19 neural network training process, we make a call to Keras’ fit_generator method, while passing in our chest X-ray data via our data augmentation object.

# compile our model print("[INFO] compiling model...") opt = Adam(lr=INIT_LR, decay=INIT_LR / EPOCHS) model.compile(loss="binary_crossentropy", optimizer=opt, metrics=["accuracy"]) # train the head of the network print("[INFO] training head...") H = model.fit_generator( trainAug.flow(trainX, trainY, batch_size=BS), steps_per_epoch=len(trainX) // BS, validation_data=(testX, testY), validation_steps=len(testX) // BS, epochs=EPOCHS)
[INFO] compiling model...
[INFO] training head...
Train for 5 steps, validate on 10 samples
Epoch 1/25
5/5 [==============================] - 5s 1s/step - loss: 0.6084 - accuracy: 0.7000 - val_loss: 0.4942 - val_accuracy: 1.0000
Epoch 2/25
5/5 [==============================] - 1s 101ms/step - loss: 0.7425 - accuracy: 0.5250 - val_loss: 0.4818 - val_accuracy: 1.0000
Epoch 3/25
5/5 [==============================] - 1s 107ms/step - loss: 0.6952 - accuracy: 0.5750 - val_loss: 0.4690 - val_accuracy: 1.0000
Epoch 4/25
5/5 [==============================] - 1s 138ms/step - loss: 0.7249 - accuracy: 0.5750 - val_loss: 0.4592 - val_accuracy: 1.0000
Epoch 5/25
5/5 [==============================] - 0s 94ms/step - loss: 0.6418 - accuracy: 0.6250 - val_loss: 0.4466 - val_accuracy: 0.8750
Epoch 6/25
5/5 [==============================] - 0s 93ms/step - loss: 0.6240 - accuracy: 0.6000 - val_loss: 0.4345 - val_accuracy: 0.8750
Epoch 7/25
5/5 [==============================] - 1s 105ms/step - loss: 0.5315 - accuracy: 0.7500 - val_loss: 0.4241 - val_accuracy: 1.0000
Epoch 8/25
5/5 [==============================] - 0s 93ms/step - loss: 0.5350 - accuracy: 0.7000 - val_loss: 0.4102 - val_accuracy: 1.0000
Epoch 9/25
5/5 [==============================] - 0s 98ms/step - loss: 0.5137 - accuracy: 0.8500 - val_loss: 0.4014 - val_accuracy: 0.8750
Epoch 10/25
5/5 [==============================] - 0s 94ms/step - loss: 0.5534 - accuracy: 0.7750 - val_loss: 0.3911 - val_accuracy: 0.8750
Epoch 11/25
5/5 [==============================] - 1s 106ms/step - loss: 0.5366 - accuracy: 0.7500 - val_loss: 0.3815 - val_accuracy: 0.8750
Epoch 12/25
5/5 [==============================] - 0s 97ms/step - loss: 0.4864 - accuracy: 0.8500 - val_loss: 0.3700 - val_accuracy: 0.8750
Epoch 13/25
5/5 [==============================] - 1s 111ms/step - loss: 0.4442 - accuracy: 0.9000 - val_loss: 0.3599 - val_accuracy: 0.8750
Epoch 14/25
5/5 [==============================] - 0s 95ms/step - loss: 0.4352 - accuracy: 0.9250 - val_loss: 0.3496 - val_accuracy: 0.8750
Epoch 15/25
5/5 [==============================] - 0s 99ms/step - loss: 0.4133 - accuracy: 0.8750 - val_loss: 0.3415 - val_accuracy: 0.8750
Epoch 16/25
5/5 [==============================] - 0s 97ms/step - loss: 0.4111 - accuracy: 0.8750 - val_loss: 0.3324 - val_accuracy: 0.8750
Epoch 17/25
5/5 [==============================] - 0s 97ms/step - loss: 0.3939 - accuracy: 0.9250 - val_loss: 0.3233 - val_accuracy: 0.8750
Epoch 18/25
5/5 [==============================] - 0s 95ms/step - loss: 0.3572 - accuracy: 0.9500 - val_loss: 0.3174 - val_accuracy: 0.8750
Epoch 19/25
5/5 [==============================] - 0s 94ms/step - loss: 0.3422 - accuracy: 0.9000 - val_loss: 0.3093 - val_accuracy: 0.8750
Epoch 20/25
5/5 [==============================] - 0s 95ms/step - loss: 0.3999 - accuracy: 0.8750 - val_loss: 0.3050 - val_accuracy: 0.8750
Epoch 21/25
5/5 [==============================] - 0s 94ms/step - loss: 0.3458 - accuracy: 0.9000 - val_loss: 0.3001 - val_accuracy: 0.8750
Epoch 22/25
5/5 [==============================] - 0s 94ms/step - loss: 0.2851 - accuracy: 1.0000 - val_loss: 0.2968 - val_accuracy: 0.8750
Epoch 23/25
5/5 [==============================] - 0s 96ms/step - loss: 0.3796 - accuracy: 0.8750 - val_loss: 0.2887 - val_accuracy: 0.8750
Epoch 24/25
5/5 [==============================] - 1s 108ms/step - loss: 0.2685 - accuracy: 0.9500 - val_loss: 0.2815 - val_accuracy: 0.8750
Epoch 25/25
5/5 [==============================] - 0s 95ms/step - loss: 0.2811 - accuracy: 0.9250 - val_loss: 0.2808 - val_accuracy: 0.8750

Evaluate the COVID 19 Detection model:

For evaluation, we first make predictions on the testing set and grab the prediction indices.

We then generate and print out a classification report using scikit-learn’s helper utility.

# make predictions on the testing set print("[INFO] evaluating network...") predIdxs = model.predict(testX, batch_size=BS) # for each image in the testing set we need to find the index of the # label with corresponding largest predicted probability predIdxs = np.argmax(predIdxs, axis=1) # show a nicely formatted classification report print(classification_report(testY.argmax(axis=1), predIdxs, target_names=lb.classes_))
[INFO] evaluating network...
              precision    recall  f1-score   support

       covid       1.00      0.80      0.89         5
      normal       0.83      1.00      0.91         5

    accuracy                           0.90        10
   macro avg       0.92      0.90      0.90        10
weighted avg       0.92      0.90      0.90        10

Plot the predictions

rows = 3 columns = 3 fig = plt.figure(figsize=(20, 20)) for m in range(1, 10): if str(predIdxs[m-1]) == "0": text = "NORMAL" color = (0, 255, 0) elif str(predIdxs[m-1]) == "1": text = "COVID" color = (255, 0, 0) img = testX[m-1].copy() # Window name in which image is displayed window_name = text # font font = cv2.FONT_HERSHEY_SIMPLEX # org org = (50, 50) # fontScale fontScale = 1 # Line thickness of 2 px thickness = 2 img = cv2.putText(img, text, org, font, fontScale, color, thickness, cv2.LINE_AA) fig.add_subplot(rows, columns, m) plt.imshow(img) plt.title("Pred: " + text) plt.axis('off') plt.show()
covid 19 plots

Plot the Ground Truths

rows = 3 columns = 3 fig = plt.figure(figsize=(20, 20)) for m in range(1, 10): if str(testY.argmax(axis=1)[m-1]) == "0": text = "NORMAL" color = (0, 255, 0) elif str(testY.argmax(axis=1)[m-1]) == "1": text = "COVID" color = (255, 0, 0) img = testX[m-1].copy() # Window name in which image is displayed window_name = text # font font = cv2.FONT_HERSHEY_SIMPLEX # org org = (50, 50) # fontScale fontScale = 1 # Line thickness of 2 px thickness = 2 img = cv2.putText(img, text, org, font, fontScale, color, thickness, cv2.LINE_AA) fig.add_subplot(rows, columns, m) plt.imshow(img) plt.title("Ground Truth: " + text) plt.axis('off') plt.show()
ground truths

Compute a confusion matrix for further statistical evaluation:

Here we will:

  • Generate a confusion matrix
  • Use the confusion matrix to derive the accuracy, sensitivity, and specificity and print each of these metrics
# compute the confusion matrix and and use it to derive the raw # accuracy, sensitivity, and specificity cm = confusion_matrix(testY.argmax(axis=1), predIdxs) total = sum(sum(cm)) acc = (cm[0, 0] + cm[1, 1]) / total sensitivity = cm[0, 0] / (cm[0, 0] + cm[0, 1]) specificity = cm[1, 1] / (cm[1, 0] + cm[1, 1]) # show the confusion matrix, accuracy, sensitivity, and specificity print(cm) print("acc: {:.4f}".format(acc)) print("sensitivity: {:.4f}".format(sensitivity)) print("specificity: {:.4f}".format(specificity))
[[4 1]
 [0 5]]
acc: 0.9000
sensitivity: 0.8000
specificity: 1.0000

Plot our training for COVID 19 Detection

We plot our training accuracy/loss history for inspection, outputting the plot to an image file. Finally, we serialize our tf.keras COVID-19 classifier model to disk:

# plot the training loss and accuracy N = EPOCHS plt.style.use("ggplot") plt.figure() plt.plot(np.arange(0, N), H.history["loss"], label="train_loss") plt.plot(np.arange(0, N), H.history["val_loss"], label="val_loss") plt.plot(np.arange(0, N), H.history["accuracy"], label="train_acc") plt.plot(np.arange(0, N), H.history["val_accuracy"], label="val_acc") plt.title("Training Loss and Accuracy on COVID-19 Dataset") plt.xlabel("Epoch #") plt.ylabel("Loss/Accuracy") plt.legend(loc="lower left") plt.savefig(plot_path) # serialize the model to disk print("[INFO] saving COVID-19 detector model...") model.save(model_path, save_format="h5")
covid 19 detection

I hope you liked this article on COVID 19 Detection using Deep Learning. Feel free to ask your valuable questions in the comments section below.

Follow Us:

Leave a Reply