In this article, I will take you through an example of Handwriting Recognition System with Python using a very popular Machine Learning Algorithm known as K Nearest Neighbors or KNN. In handwriting recognition, the machine learning algorithm interprets the user’s handwritten characters or words in a format that the computer understands.
Handwriting Recognition System using Machine Learning with Python
Here I will only work with the numbers 0–9. Some examples are shown in the image above. These numbers have been processed by image processing software to make them the same size and colour. They are all 32×32 black and white.
The binary images have been converted to text format to facilitate this example, although this is not the most efficient use of memory.
Converting Images into Test Vectors
The following code is a small function called img2vector, which converts images to vectors. The function below creates a NumPy 1×1024 array, then opens the input file, loops through the first 32 lines of the file, and stores the integer value of the first 32 characters of each row in the NumPy array:
Now, let’s try the function created above by taking the image as input:
testVector = kNN.img2vector('0_13.txt') testVector[0,0:31]
array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
Using KNN for Handwriting Recognition
Now that you have the data in a format that you can plug into our classifier, you are ready to test this idea and see how it works. The function created below as handwritingClassTest(), is a stand-alone function that tests our classifier:
Now let’s test our code:
the classifier came back with: 0, the real answer is: 0 the classifier came back with: 0, the real answer is: 0 . . the classifier came back with: 7, the real answer is: 7 the classifier came back with: 7, the real answer is: 7 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 8, the real answer is: 8 the classifier came back with: 6, the real answer is: 8 . . the classifier came back with: 9, the real answer is: 9 the total number of errors is: 11 the total error rate is: 0.011628
Using the kNN algorithm on this dataset, you were able to achieve an error rate of 1.2%. You can vary k to see how it changes. You can also modify the handwritingClassTest function to randomly select training examples.
This way you can vary the number of training examples and see how that affects the error rate. Depending on the speed of your computer, you may think this algorithm is slow, and you are correct. For each of our 900 test cases, you had to perform 2000 distance calculations on a floating point vector with 1024 inputs.
Also, our test dataset size was 2MB. Is there a way to reduce that and take fewer calculations? A modification of kNN, called kD-tree, allows you to reduce the number of calculations.
The k-Nearest Neighbors algorithm is a simple and efficient way to classify data. The example of handwriting recognition should be a good proof of the power of a classifier. kNN is an example of instance-based learning, where you need to have instances of data on hand to run the machine learning algorithm.
The algorithm must carry the entire data set; for large datasets, this involves a large amount of storage. Additionally, you have to calculate the distance measure for each data item in the database, which can be tedious.
I hope you liked this article on how to build a Handwriting Recognition System with Python using the KNN algorithm in Machine Learning. Feel free to ask your valuable questions in the comments section below.