Text Classification with TensorFlow in Machine Learning

In this article, I will introduce you to a text classification model with TensorFlow on movie reviews as positive or negative using the text of the reviews. This is a binary classification problem, which is an important and widely applicable type of machine learning problem.

Text Classification with TensorFlow

I’ll walk you through the basic application of transfer learning with TensorFlow Hub and Keras. I will be using the IMDB dataset which contains the text of 50,000 movie reviews from the internet movie database. These are divided into 25,000 assessments for training and 25,000 assessments for testing. The training and test sets are balanced in a way that they contain an equal number of positive and negative reviews.

Also, Read – Data Science Project on Diamonds Analysis with Python.

Now, let’s get started with this task of text classification with TensorFlow by importing some necessary libraries:

import numpy as np import tensorflow as tf !pip install tensorflow-hub !pip install tensorflow-datasets import tensorflow_hub as hub import tensorflow_datasets as tfds print("Version: ", tf.__version__) print("Eager mode: ", tf.executing_eagerly()) print("Hub version: ", hub.__version__) print("GPU is", "available" if tf.config.experimental.list_physical_devices("GPU") else "NOT AVAILABLE")

Although the dataset I am using here is available online to download, but I will simply load the data using TensorFlow. It means you don’t need to download the dataset from any external sources. Now, I will simply load the data and split it into training and test sets:

# Split the training set into 60% and 40%, so we'll end up with 15,000 examples # for training, 10,000 examples for validation and 25,000 examples for testing. train_data, validation_data, test_data = tfds.load( name="imdb_reviews", split=('train[:60%]', 'train[60%:]', 'test'), as_supervised=True)

Data Exploration

Let’s have a look at the data to figure out what we are going to work with. I will simply print the first 10 samples from the dataset:

train_examples_batch, train_labels_batch = next(iter(train_data.batch(10))) train_examples_batch

Now, let’s print the first 10 labels from the data set:

train_labels_batch
Output: <tf.Tensor: shape=(10,), dtype=int64, numpy=array([0, 0, 0, 1, 1, 1, 0, 0, 0, 0])>

Building Text Classification Model

To build a model for the task of Text Classification with TensorFlow, I will use a pre-trained model provided by TensorFlow which is known by the name TensorFlow Hub. Let’s first create a Keras layer that uses a TensorFlow Hub model to the embed sentences, and try it out on some sample input:

embedding = "https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1" hub_layer = hub.KerasLayer(embedding, input_shape=[], dtype=tf.string, trainable=True) hub_layer(train_examples_batch[:3])

Now build the model on the complete dataset:

model = tf.keras.Sequential() model.add(hub_layer) model.add(tf.keras.layers.Dense(16, activation='relu')) model.add(tf.keras.layers.Dense(1)) model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
keras_layer (KerasLayer)     (None, 20)                400020    
_________________________________________________________________
dense (Dense)                (None, 16)                336       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 17        
=================================================================
Total params: 400,373
Trainable params: 400,373
Non-trainable params: 0
_________________________________________________________________

Compile The Model

Now, I will compile the model by using the loss function and the adam optimizer:

model.compile(optimizer='adam', loss=tf.keras.losses.BinaryCrossentropy(from_logits=True), metrics=['accuracy'])

Trani The Text Classification Model

Train the model for 20 epochs in mini-sets of 512 samples. These are 20 iterations on all the samples of the tensors x_train and y_train. During training, monitor model loss and accuracy on the 10,000 samples in the validation set:

history = model.fit(train_data.shuffle(10000).batch(512), epochs=20, validation_data=validation_data.batch(512), verbose=1)

Evaluating The Model

And let’s see how the text classification model works. Two values ​​will be returned. Loss and accuracy rate:

results = model.evaluate(test_data.batch(512), verbose=2) for name, value in zip(model.metrics_names, results): print("%s: %.3f" % (name, value))
49/49 - 3s - loss: 0.3217 - accuracy: 0.8553
loss: 0.322
accuracy: 0.855

Also, Read – Computer Vision Tutorial with Python.

So our Text Classification Model achieved an accuracy rate of 85 per cent which is generally appreciated. I hope you liked this article on Text Classification Model with TensorFlow. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning. 

Follow Us:

Leave a Reply