# Manifold Learning

Rotating, re-orienting, or stretching the piece of paper in three-dimensional space doesn’t change the flat geometry of the article: such operations are akin to linear embeddings. If you bend, curl, or crumple the paper, it is still a two-dimensional manifold, but the embedding into the three-dimensional space is no longer linear. Manifold learning algorithms would seek to learn about the fundamental two-dimensional nature of the paper, even as it is contorted to fill the three-dimensional space.

Here I will demonstrate several different methods, going most deeply into a couple of techniques: multidimensional scaling (MDS), locally linear embedding (LLE), and isometric mapping (IsoMap).

I will begin with the standard imports:

.wp-block-code {
border: 0;
}

.wp-block-code > span {
display: block;
overflow: auto;
}

.shcb-language {
border: 0;
clip: rect(1px, 1px, 1px, 1px);
-webkit-clip-path: inset(50%);
clip-path: inset(50%);
height: 1px;
margin: -1px;
overflow: hidden;
position: absolute;
width: 1px;
word-wrap: normal;
word-break: normal;
}

.hljs {
box-sizing: border-box;
}

.hljs.shcb-code-table {
display: table;
width: 100%;
}

.hljs.shcb-code-table > .shcb-loc {
color: inherit;
display: table-row;
width: 100%;
}

.hljs.shcb-code-table .shcb-loc > span {
display: table-cell;
}

.wp-block-code code.hljs:not(.shcb-wrap-lines) {
white-space: pre;
}

.wp-block-code code.hljs.shcb-wrap-lines {
white-space: pre-wrap;
}

.hljs.shcb-line-numbers {
border-spacing: 0;
counter-reset: line;
}

.hljs.shcb-line-numbers > .shcb-loc {
counter-increment: line;
}

.hljs.shcb-line-numbers .shcb-loc > span {
}

.hljs.shcb-line-numbers .shcb-loc::before {
border-right: 1px solid #ddd;
content: counter(line);
display: table-cell;
text-align: right;
-webkit-user-select: none;
-moz-user-select: none;
-ms-user-select: none;
user-select: none;
white-space: nowrap;
width: 1%;
}
import matplotlib.pyplot as plt
import seaborn as sns; sns.set()
import numpy as npCode language: Python (python)

### Manifold Learning: “HELLO”

To make these concepts more clear, let’s start by generating some two-dimensional data that we can use to define a manifold. Here is a function that will create data in the shape of the word “HELLO”:

def make_hello(N=1000, rseed=42):
# Make a plot with "HELLO" text; save as PNG
fig, ax = plt.subplots(figsize=(4, 1))
ax.axis('off')
ax.text(0.5, 0.4, 'HELLO', va='center', ha='center', weight='bold', size=85)
fig.savefig('hello.png')
plt.close(fig)

# Open this PNG and draw random points from it
rng = np.random.RandomState(rseed)
X = rng.rand(4 * N, 2)
i, j = (X * data.shape).astype(int).T
mask = (data[i, j] &lt; 1)
X[:, 0] *= (data.shape / data.shape)
X = X[:N]
return X[np.argsort(X[:, 0])]Code language: Python (python)

Let’s call the function and visualize the resulting data:

X = make_hello(1000)
colorize = dict(c=X[:, 0], cmap=plt.cm.get_cmap('rainbow', 5))
plt.scatter(X[:, 0], X[:, 1], **colorize)
plt.axis('equal')Code language: Python (python)

The output is two dimensional, and consists of points drawn in the shape of the word, “HELLO”. This data form will help us to see visually what these algorithms are doing.

Also, read – PCA in Machine Learning

### Multidimensional Scaling (MDS)

Looking at data like this, we can see that the particular choice of x and y values of the dataset are not the most fundamental description of the data: we can scale, shrink, or rotate the data, and the “HELLO” will still be apparent.

For example, if we use a rotation matrix to switch the data, the x and y values change, but the data is still fundamentally the same:

def rotate(X, angle):
R = [[np.cos(theta), np.sin(theta)],
[-np.sin(theta), np.cos(theta)]]
return np.dot(X, R)

X2 = rotate(X, 20) + 5
plt.scatter(X2[:, 0], X2[:, 1], **colorize)
plt.axis('equal')Code language: Python (python)

This tells us that the x and y values are not necessarily fundamental to the relationships in the data. What is fundamental, in this case, is the distance between each point and the other points in the dataset. A common way to represent this is to use a distance matrix: for $N$ points, we construct an N \times N array such that entry (i, j) contains the distance between point i and point j.

Let’s use Scikit-Learn’s efficient pairwise_distances function to do this for our original data:

from sklearn.metrics import pairwise_distances
D = pairwise_distances(X)
D.shapeCode language: Python (python)
(1000, 1000)

As promised, for our N=1,000 points, we obtain a 1000×1000 matrix, which can be visualised as shown here:

plt.imshow(D, zorder=2, cmap='Blues', interpolation='nearest')
plt.colorbar()Code language: Python (python)

This distance matrix gives us a representation of our data that is invariant to rotations and translations, but the visualisation of the matrix above is not entirely intuitive. In the image shown in this figure, we have lost any visible sign of the impressive structure in the data: the “HELLO” that we saw before.

Further, while computing this distance matrix from the (x, y) coordinates is straightforward, transforming the distances back into x and y coordinates is somewhat tricky. This is precisely what the multidimensional scaling algorithm aims to do: given a distance matrix between points, it recovers a $D$-dimensional coordinate representation of the data.

Let’s see how it works for our distance matrix, using the precomputed dissimilarity to specify that we are passing a distance matrix:

from sklearn.manifold import MDS
model = MDS(n_components=2, dissimilarity='precomputed', random_state=1)
out = model.fit_transform(D)
plt.scatter(out[:, 0], out[:, 1], **colorize)
plt.axis('equal')Code language: Python (python)

The MDS algorithm recovers one of the possible two-dimensional coordinate representations of our data, using only the N\times N distance matrix describing the relationship between the data points.

### MDS as Manifold Learning

The usefulness of this becomes more apparent when we consider the fact that distance matrices can be computed from data in any dimension. So, for example, instead of simply rotating the data in the two-dimensional plane, we can project it into three dimensions using the following function:

def random_projection(X, dimension=3, rseed=42):
assert dimension &gt;= X.shape
rng = np.random.RandomState(rseed)
C = rng.randn(dimension, dimension)
e, V = np.linalg.eigh(np.dot(C, C.T))
return np.dot(X, V[:X.shape])

X3 = random_projection(X, 3)
X3.shapeCode language: Python (python)
(1000, 3)

Let’s visualize these points to see what we’re working with:

from mpl_toolkits import mplot3d
ax = plt.axes(projection='3d')
ax.scatter3D(X3[:, 0], X3[:, 1], X3[:, 2],
**colorize)
ax.view_init(azim=70, elev=50)Code language: Python (python)

We can now ask the MDS estimator to input this three-dimensional data, compute the distance matrix and then determine the optimal two-dimensional embedding for this distance matrix. The result recovers a representation of the original data:

model = MDS(n_components=2, random_state=1)
out3 = model.fit_transform(X3)
plt.scatter(out3[:, 0], out3[:, 1], **colorize)
plt.axis('equal')Code language: Python (python)

This is essentially the goal of a manifold learning estimator: given high-dimensional embedded data, it seeks a low-dimensional representation of the data that preserves individual relationships within the data. In the case of MDS, the quantity protected is the distance between every pair of points.

### Manifold Learning Example: Isomap on Faces

One place manifold learning is often used is in understanding the relationship between high-dimensional data points. A typical case of high-dimensional data is images: for example, a set of pictures with 1,000 pixels each can be thought of as a collection of points in 1,000 dimensions – the brightness of each pixel in each image defines the coordinate in that dimension.

Here let’s apply Isomap on some faces data. I will use the Labeled Faces in the Wild dataset provided by scikit-learn.

from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=30)
faces.data.shapeCode language: Python (python)
(2370, 2914)

We have 2,370 images, each with 2,914 pixels. In other words, the images can be thought of as data points in a 2,914-dimensional space.

Let’s quickly visualize several of these images to see what we’re working with:

fig, ax = plt.subplots(4, 8, subplot_kw=dict(xticks=[], yticks=[]))
for i, axi in enumerate(ax.flat):
axi.imshow(faces.images[i], cmap='gray')Code language: Python (python)

I would like to plot a low-dimensional embedding of the 2,914-dimensional data to learn the fundamental relationships between the images. One useful way to start is to compute a PCA, and examine the explained variance ratio, which will give us an idea of how many linear features are required to describe the data:

from sklearn.decomposition import RandomizedPCA
model = RandomizedPCA(100).fit(faces.data)
plt.plot(np.cumsum(model.explained_variance_ratio_))
plt.xlabel('n components')
plt.ylabel('cumulative variance')Code language: Python (python)

We see that for this data, nearly 100 components are required to preserve 90% of the variance: this tells us that the data is intrinsically very high dimensional—it can’t be described linearly with just a few components.

When this is the case, nonlinear manifold embeddings like LLE and Isomap can be helpful. We can compute an Isomap embedding on these faces using the same pattern shown before:

from sklearn.manifold import Isomap
model = Isomap(n_components=2)
proj = model.fit_transform(faces.data)
proj.shapeCode language: Python (python)
(2370, 2)

The output is a two-dimensional projection of all the input images. To get a better idea of what the forecast tells us, let’s define a function that will output image thumbnails at the locations of the projections:

from matplotlib import offsetbox

def plot_components(data, model, images=None, ax=None,
thumb_frac=0.05, cmap='gray'):
ax = ax or plt.gca()

proj = model.fit_transform(data)
ax.plot(proj[:, 0], proj[:, 1], '.k')

if images is not None:
min_dist_2 = (thumb_frac * max(proj.max(0) - proj.min(0))) ** 2
shown_images = np.array([2 * proj.max(0)])
for i in range(data.shape):
dist = np.sum((proj[i] - shown_images) ** 2, 1)
if np.min(dist) &lt; min_dist_2:
# don't show points that are too close
continue
shown_images = np.vstack([shown_images, proj[i]])
imagebox = offsetbox.AnnotationBbox(
offsetbox.OffsetImage(images[i], cmap=cmap),
proj[i])
ax.add_artist(imagebox)Code language: Python (python)

Calling this function now, we see the result:

fig, ax = plt.subplots(figsize=(10, 10))
plot_components(faces.data,
model=Isomap(n_components=2),
images=faces.images[:, ::2, ::2])

The result is impressive: the first two Isomap dimensions seem to describe global image features: the overall darkness or lightness of the image from left to right, and the general orientation of the face from bottom to top. This gives us an excellent visual indication of some of the fundamental features in our data. 