Autocorrect with Python

Have you ever thought about how the autocorrect features works in the keyboard of a smartphone? Almost every smartphone brand irrespective of its price provides an autocorrect feature in their keyboards today. So let’s understand how the autocorrect features works. In this article, I will take you through how to build autocorrect with Python.

Autocorrect with Python: How It Works?

autocorrect

With the context of machine learning, autocorrect is based on natural language processing. As the name suggests it is programmed to correct spellings and errors while typing. So how it works?

Also, Read – Machine Learning Full Course for free.

Before I get into the coding stuff let’s understand how autocorrect works. Let’s say you typed a word in your keyboard if the word will exist in the vocabulary of our smartphone then it will assume that you have written the right word. Now it doesn’t matter whether you write a name, a noun or any word on the planet.

If the word exists in the history of the smartphone, it will generalize the word as a correct word. What if the word doesn’t exist? If the word that you typed is a non-existing word in the history of our smartphone then the autocorrect is programmed to find the most similar words in the history of our smartphone.

Build an Autocorrect with Python

I hope you now know what autocorrect is and how it works. Now let’s see how we can build an autocorrect feature with Python. Like our smartphone uses history to match the type words whether it’s correct or not. So here we also need to use some words to put the functionality in our autocorrect.

So I will use the text from a book which you can easily download from here. Now let’s get started with the task to build an autocorrect with Python.

For this task, we need some libraries. The libraries that I am going to use are very general as a machine learning practitioner. So you must be having all the libraries installed in your system already except one. You need to install a library known as textdistance, which can be easily installed by using the pip command; pip install textdistance.

Now let’s get started with this task by importing all the necessary packages and by reading our text file:

The first ten words in the text are: 
['moby', 'dick', 'by', 'herman', 'melville', '1851', 'etymology', 'supplied', 'by', 'a']
There are 17140 unique words in the vocabulary.

In the above code, we made a list of words, and now we need to build the frequency of those words, which can be easily done by using the counter function in Python:

[('the', 14431), ('of', 6609), ('and', 6430), ('a', 4736), ('to', 4625), ('in', 4172), ('that', 3085), ('his', 2530), ('it', 2522), ('i', 2127)]

Relative Frequency of words

Now we want to get the probability of occurrence of each word, this equals the relative frequencies of the words:

Finding Similar Words

Now we will sort similar words according to the Jaccard distance by calculating the 2 grams Q of the words. Next, we will return the 5 most similar words ordered by similarity and probability:

Now, let’s find the similar words by using our autocorrect function:

my_autocorrect('neverteless')Code language: JavaScript (javascript)
WordProbSimilarity
2209nevertheless0.0002290.750000
13300boneless0.0000140.416667
12309elevates0.0000050.416667
718never0.0009420.400000
6815level0.0001100.400000

As we took words from a book the same way their are some words already present in the vocabulary of the smartphone and some words it records while the user starts using the keyboard.

I hope you liked this article on how to build an autocorrect feature with Python. Feel free to ask your valuable questions in the comments section below.

Follow Us:

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1433

Leave a Reply