Text Classification with TensorFlow in Machine Learning

In this article, I will introduce you to a text classification model with TensorFlow on movie reviews as positive or negative using the text of the reviews. This is a binary classification problem, which is an important and widely applicable type of machine learning problem.

Text Classification with TensorFlow

I’ll walk you through the basic application of transfer learning with TensorFlow Hub and Keras. I will be using the IMDB dataset which contains the text of 50,000 movie reviews from the internet movie database. These are divided into 25,000 assessments for training and 25,000 assessments for testing. The training and test sets are balanced in a way that they contain an equal number of positive and negative reviews.

Also, Read – Data Science Project on Diamonds Analysis with Python.

Now, let’s get started with this task of text classification with TensorFlow by importing some necessary libraries:

import numpy as np

import tensorflow as tf

!pip install tensorflow-hub
!pip install tensorflow-datasets
import tensorflow_hub as hub
import tensorflow_datasets as tfds

print("Version: ", tf.__version__)
print("Eager mode: ", tf.executing_eagerly())
print("Hub version: ", hub.__version__)
print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")Code language: JavaScript (javascript)

Although the dataset I am using here is available online to download, but I will simply load the data using TensorFlow. It means you don’t need to download the dataset from any external sources. Now, I will simply load the data and split it into training and test sets:

# Split the training set into 60% and 40%, so we'll end up with 15,000 examples
# for training, 10,000 examples for validation and 25,000 examples for testing.
train_data, validation_data, test_data = tfds.load(
    split=('train[:60%]', 'train[60%:]', 'test'),
    as_supervised=True)Code language: PHP (php)

Data Exploration

Let’s have a look at the data to figure out what we are going to work with. I will simply print the first 10 samples from the dataset:

train_examples_batch, train_labels_batch = next(iter(train_data.batch(10)))

Now, let’s print the first 10 labels from the data set:

Output: <tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 0, 0, 1, 1, 1, 0, 0, 0, 0])>

Building Text Classification Model

To build a model for the task of Text Classification with TensorFlow, I will use a pre-trained model provided by TensorFlow which is known by the name TensorFlow Hub. Let’s first create a Keras layer that uses a TensorFlow Hub model to the embed sentences, and try it out on some sample input:

embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1"
hub_layer = hub.KerasLayer(embedding, input_shape=[], 
                           dtype=tf.string, trainable=True)
hub_layer(train_examples_batch[:3])Code language: PHP (php)

Now build the model on the complete dataset:

model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(16, activation='relu'))

model.summary()Code language: JavaScript (javascript)
Model: "sequential"
Layer (type)                 Output Shape              Param #   
keras_layer (KerasLayer)     (None, 20)                400020    
dense (Dense)                (None, 16)                336       
dense_1 (Dense)              (None, 1)                 17        
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0

Compile The Model

Now, I will compile the model by using the loss function and the adam optimizer:

              metrics=['accuracy'])Code language: PHP (php)

Trani The Text Classification Model

Train the model for 20 epochs in mini-sets of 512 samples. These are 20 iterations on all the samples of the tensors x_train and y_train. During training, monitor model loss and accuracy on the 10,000 samples in the validation set:

history = model.fit(train_data.shuffle(10000).batch(512),

Evaluating The Model

And let’s see how the text classification model works. Two values ​​will be returned. Loss and accuracy rate:

results = model.evaluate(test_data.batch(512), verbose=2)

for name, value in zip(model.metrics_names, results):
  print("%s: %.3f" % (name, value))Code language: PHP (php)
49/49 - 3s - loss: 0.3217 - accuracy: 0.8553
loss: 0.322
accuracy: 0.855

Also, Read – Computer Vision Tutorial with Python.

So our Text Classification Model achieved an accuracy rate of 85 per cent which is generally appreciated. I hope you liked this article on Text Classification Model with TensorFlow. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning. 

Follow Us:

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1537

Leave a Reply