BERT in Machine Learning

In this article, I’m going to take you through an in-depth review of BERT in Machine Learning for word embeddings produced by Google for Machine Learning. Here I’ll show you how to get started with BERT in Machine Learning by producing your word embeddings.

What is BERT in Machine Learning?

BERT stands for Bidirectional Encoder Representations from Transformers, BERT in Machine Learning are models for pre-trained language representations that can be used to create models for the tasks of Natural Language Processing.

Also, Read – 5 Python Projects for Beginners

You can either use these models to extract high-quality language functionality from your text data, or you can refine these models on specific tasks such as classification, feature recognition, answering questions, etc. with your data to produce a state of artistic predictions.

Why BERT Embeddings for NLP?

First, the BERT embeddings are very useful for keyword expansion, semantic search, and other information retrievals. For example, if you want to match customer questions or research to previously answered questions or well-researched research, these representations will help you accurately retrieve results that match customer intent and contextual meaning, even in the absence of overlapping keywords or phrases.

Secondly, and perhaps the most important reason is that these vectors can be used as high-quality features inputs in the downstream models. NLP models such as LSTMs or CNNs require inputs in the form of digital vectors, which typically means translating features such as vocabulary and parts of speech into digital representations.

Implementing BERT Algorithm

For the implementation of the BERT algorithm in machine learning, you must install the PyTorch package. I selected PyTorch because it strikes a good balance between high-level APIs and TensorFlow code. Now, let’s implement the necessary packages to get started with the task:

!pip install torch
import torch
!pip install pytorch_pretrained_bert
from pytorch_pretrained_bert import BertTokenizer, BertModel, BertForMaskedLM
import matplotlib.pyplot as plt
%matplotlib inline

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')Code language: JavaScript (javascript)

Input Formatting

Since BERT is a pre-trained model that expects input data in a specific format, we will need:

  • A special token, [SEP], to mark the end of a sentence or the separation between two sentences
  • A special token, [CLS], at the start of our text. This token is used for classification tasks, but BERT expects it regardless of your application.
text = "This is the sample sentence for BERT word embeddings"
marked_text = "[CLS] " + text + " [SEP]"
print (marked_text)Code language: PHP (php)

Output: [CLS] This is the sample sentence for BERT word embeddings [SEP]


The BERT model provides its tokenizer, which we imported above. Let’s see how it handles the sample text below:

tokenized_text = tokenizer.tokenize(marked_text)
print (tokenized_text)Code language: PHP (php)
Output: ['[CLS]', 'this', 'is', 'the', 'sample', 'sentence', 'for', 'bert', 'word', 'em', '##bed', '##ding', '##s', '[SEP]']

The original text has been split into smaller subwords and characters. The two hash signs that precede some of these subwords are just how our tokenizer indicates that this subword or character is part of a larger word and is preceded by another subword.

Converting Tokens to ID

To tokenize a word under this template, the tokenizer first checks whether the entire word is in the vocabulary. Otherwise, it tries to break the word down into the largest possible sub-words contained in the vocabulary, and as a last resort will break the word down into individual characters. Note that because of this, we can still represent a word as, at the very least, all of its characters:

indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)

for tup in zip(tokenized_text, indexed_tokens):
    print(tup)Code language: PHP (php)
('[CLS]', 101)
('this', 2023)
('is', 2003)
('the', 1996)
('sample', 7099)
('sentence', 6251)
('for', 2005)
('bert', 14324)
('word', 2773)
('em', 7861)
('##bed', 8270)
('##ding', 4667)
('##s', 2015)
('[SEP]', 102)

Also, Read – Best IDEs for Machine Learning.

In this way, you can prepare word embeddings using the BERT model for any task of NLP. I hope you liked this article. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Follow Us:

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1501

Leave a Reply