Data Science Project on – Handwritten Digits

Let’s consider one piece of the optical character recognition problem in Data Science- The identification of handwritten digits. Here we will take a shortcut and use Scikit-Learn’s set of preformatted digits, which is built into the library.

Loading and Visualizing the digits data:

We will use Scikit-Learn’s data access interface and take a look at this data:

from sklearn.datasets import load_digits
digits = load_digits()

#output- (1797, 8, 8)

The images is a three-dimensional array: 1797 samples, each consisting of an 8*8 grid of pixels.

Let’s visualize the First hundred of these:

import matplotlib.pyplot as plt
fig, axes = plt.subplots(10, 10, figsize=(8, 8),
                        subplot_kw={'xticks':[], 'yticks':[]},
                        gridspec_kw=dict(hspace=0.1, wspace=0.1))
for i, ax in enumerate(axes.flat):
    ax.imshow(digits.images[i], cmap='binary', interpolation='nearest')
    ax.text(0.05, 0.05, str([i]),
            transform=ax.transAxes, color='green')
x =

#Output-(1797, 64)

y =


Unsupervised Learning(classification of Digits):

Unsupervised Learning is a Category of Machine Learning, In data science we need to use a lot of machine learning models, This is one of them:

from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB 
xtrain, xtest, ytrain, ytest = train_test_split(x, y, random_state=0)
model = GaussianNB(), ytrain)  
ymodel = model.predict(xtest)   
from sklearn.metrics import accuracy_score
print(accuracy_score(ytest, ymodel))

#Output- 0.8333333333333334

Creating Structure and Plotting the Data

from sklearn.metrics import confusion_matrix
import  seaborn as sns
mat = confusion_matrix(ytest, ymodel)
sns.heatmap(mat, square=True, annot=True, cbar=False)
plt.xlabel('predicted value')
plt.ylabel('true value')
Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1537

One comment

Leave a Reply