Let’s consider one piece of the optical character recognition problem in Data Science- The identification of handwritten digits. Here we will take a shortcut and use Scikit-Learn’s set of preformatted digits, which is built into the library.
Loading and Visualizing the digits data:
We will use Scikit-Learn’s data access interface and take a look at this data:
from sklearn.datasets import load_digits digits = load_digits() print(digits.images.shape)
#output- (1797, 8, 8)
The images is a three-dimensional array: 1797 samples, each consisting of an 8*8 grid of pixels.
Let’s visualize the First hundred of these:
import matplotlib.pyplot as plt fig, axes = plt.subplots(10, 10, figsize=(8, 8), subplot_kw={'xticks':[], 'yticks':[]}, gridspec_kw=dict(hspace=0.1, wspace=0.1)) for i, ax in enumerate(axes.flat): ax.imshow(digits.images[i], cmap='binary', interpolation='nearest') ax.text(0.05, 0.05, str(digits.target[i]), transform=ax.transAxes, color='green') plt.show()

x = digits.data print(x.shape)
#Output-(1797, 64)
y = digits.target print(y.shape)
#Output-(1797,)
Unsupervised Learning(classification of Digits):
Unsupervised Learning is a Category of Machine Learning, In data science we need to use a lot of machine learning models, This is one of them:
from sklearn.model_selection import train_test_split from sklearn.naive_bayes import GaussianNB xtrain, xtest, ytrain, ytest = train_test_split(x, y, random_state=0) model = GaussianNB() model.fit(xtrain, ytrain) ymodel = model.predict(xtest) from sklearn.metrics import accuracy_score
print(accuracy_score(ytest, ymodel))
#Output- 0.8333333333333334
Creating Structure and Plotting the Data
from sklearn.metrics import confusion_matrix import seaborn as sns mat = confusion_matrix(ytest, ymodel) sns.heatmap(mat, square=True, annot=True, cbar=False) plt.xlabel('predicted value') plt.ylabel('true value') plt.show()

[…] Data Science Project on – Handwritten Digits […]