The Ridge Regression is a regularized version of a Linear Regression. The Ridge Regression enables the machine learning algorithms to not only fit the data but also to keep weights of the model as small as possible.
It is quite familiar with the cost function that is used while training to be different from the performance measures that are used for testing. Apart from the Regularization, another reason for this difference is that a proper training data cost function should have optimization friendly derivatives. In contrast, the performance measures that are used for testing should be close as possible as the final objective.
Also, Read: Machine Translation Model using Neural Networks.
It is essential to scale the data by using Standard Scaler before using Ridge Regression, as it is sensitive to the scale of the input features. Now let’s go through the Ridge Regression algorithm to understand how to regularize a Liner Model using a Ridge algorithm.
Data Preparation
We can use the Ridge algorithm either by computing a closed-form equation or by performing a Gradient Descent algorithm. Now to move further I will prepare the data using mathematical equations:
from sklearn.preprocessing import PolynomialFeatures
import numpy as np
np.random.seed(42)
m = 20
X = 3 * np.random.rand(m, 1)
y = 1 + 0.5 * X + np.random.randn(m, 1) / 1.5
X_new = np.linspace(0, 3, 100).reshape(100, 1)
Code language: Python (python)
Ridge Regression Algorithm
Now here is how you can easily perform a Ridge Regression Algorithm using Scikit-Learn:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1, solver="cholesky", random_state=42)
ridge_reg.fit(X, y)
ridge_reg.predict([[1.5]])
ridge_reg = Ridge(alpha=1, solver="sag", random_state=42)
ridge_reg.fit(X, y)
ridge_reg.predict([[1.5]])
Code language: Python (python)
Now, let’s train and visualize the linear model using the ridge algorithm:
from sklearn.preprocessing import PolynomialFeatures
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
def plot_model(model_class, polynomial, alphas, **model_kargs):
for alpha, style in zip(alphas, ("b-", "g--", "r:")):
model = model_class(alpha, **model_kargs) if alpha > 0 else LinearRegression()
if polynomial:
model = Pipeline([
("poly_features", PolynomialFeatures(degree=10, include_bias=False)),
("std_scaler", StandardScaler()),
("regul_reg", model),
])
model.fit(X, y)
y_new_regul = model.predict(X_new)
lw = 2 if alpha > 0 else 1
plt.plot(X_new, y_new_regul, style, linewidth=lw, label=r"$\alpha = {}$".format(alpha))
plt.plot(X, y, "b.", linewidth=3)
plt.legend(loc="upper left", fontsize=15)
plt.xlabel("$x_1$", fontsize=18)
plt.axis([0, 3, 0, 4])
plt.figure(figsize=(8,4))
plt.subplot(121)
plot_model(Ridge, polynomial=False, alphas=(0, 10, 100), random_state=42)
plt.ylabel("$y$", rotation=0, fontsize=18)
plt.subplot(122)
plot_model(Ridge, polynomial=True, alphas=(0, 10**-5, 1), random_state=42)
plt.show()
Code language: Python (python)

Now let’s go through the output:
- On the left, plain Ridge models are used, leading to linear predictions. On the right, the data is first expanded using PolynomialFeatures(degree=10).
- It is scaled using a StandardScaler, and finally, the Ridge models are applied to the resulting features: this is Polynomial Regression with Ridge regularization.
- Note how increasing α leads to flatter (i.e., less extreme, more reasonable) predictions, thus reducing the model’s variance but increasing its bias.
I hope you liked this asticle, feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to read more amazing articles.