Ridge and Lasso Regression with Python

In this article, I will take you through the Ridge and Lasso Regression in Machine Learning and how to implement it by using the Python Programming Language.

The Ridge and Lasso regression models are regularized linear models which are a good way to reduce overfitting and to regularize the model: the less degrees of freedom it has, the harder it will be to overfit the data. A simple way to regularize a polynomial model is to reduce the number of polynomial degrees.

Also, Read – Machine Learning Full Course for free.

For a linear model, regularization is usually done by constraining the model weights. We will now look at the Ridge regression and lasso regression, which implement the different ways of constraining weights.

Ridge Regression

Ridge regression is a regularized version of linear regression. This forces the training algorithm not only to fit the data but also to keep the model weights as small as possible.

Note that the accrual term should only be added to the cost function during training. After you train the model, you want to use the unregulated performance measure to evaluate the performance of the model.

Lasso Regression

Least absolute shrinkage and selection operator regression (usually just called lasso regression) is another regularized version of linear regression: just like peak regression, it adds a regularization term to the cost function. , but it uses the ℓ1 norm of the weight vector instead of half the square of the ℓ2 norm.

Ridge and Lasso Regression with Python

Like other tasks, in this task to show the implementation of Ridge and Lasso Regression with Python, I will start with importing the required Python packages and modules:

import pandas as pd import numpy as np import matplotlib.pyplot as plt

Now let’s import the data and do some data cleaning and have a look at how the data looks we are going to work with. You can download the dataset that I am using in this task from here:

data = pd.read_csv("Advertising.csv") print(data.head())
   Unnamed: 0     TV  Radio  Newspaper  Sales
0           1  230.1   37.8       69.2   22.1
1           2   44.5   39.3       45.1   10.4
2           3   17.2   45.9       69.3    9.3
3           4  151.5   41.3       58.5   18.5
4           5  180.8   10.8       58.4   12.9

Now I will remove the unnamed column:

data.drop(["Unnamed: 0"], axis=1, inplace=True)

Now we only have three advertising media and sales are our target variable. Let’s see how each variable affects sales by creating a scatter plot. First, we build a helper function to create a scatter plot:

def scatter_plot(feature, target): plt.figure(figsize=(16, 18)) plt.scatter(data[feature], data[target], c='black' ) plt.xlabel("Money Spent on {} ads ($)".format(feature)) plt.ylabel("Sales ($k)") plt.show() scatter_plot("TV", "Sales") scatter_plot("Radio", "Sales") scatter_plot("Newspaper", "Sales")
TV ads sales
Radio ads sales
Ridge and Lasso Regression

Multiple Linear Regression Algorithm

As Ridge and Lasso Regression models are a way of regularizing the linear models so we first need to prepare a linear model. So now, let’s code for preparing a multiple linear regression model:

from sklearn.model_selection import cross_val_score from sklearn.linear_model import LinearRegression xs = data.drop(["Sales"], axis=1) y = data["Sales"].values.reshape(-1,1) linreg = LinearRegression() MSE = cross_val_score(linreg, xs, y, scoring="neg_mean_squared_error", cv=5) mean_MSE = np.mean(MSE) print(mean_MSE)

Now, we need to see what’s better Ridge Regression or Lasso Regression.

Ridge Regression

For the ridge regression algorithm, I will use GridSearchCV model provided by Scikit-learn, which will allow us to automatically perform the 5-fold cross-validation to find the optimal value of alpha.

This is how the code looks like for the Ridge Regression algorithm:

# Ridge Regression from sklearn.model_selection import GridSearchCV from sklearn.linear_model import Ridge ridge = Ridge() parameters = {"alpha":[1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]} ridge_regression = GridSearchCV(ridge, parameters, scoring='neg_mean_squared_error', cv=5) ridge_regression.fit(xs, y)

And then we can easily find the best parameter and the best MSE by using the following commands:

print(ridge_regression.best_params_) print(ridge_regression.best_score_)

{‘alpha’: 20}
-3.0726713383411424

Lasso Regression

For the Lasso Regression also we need to follow the same process as we din in the Ridge Regression. This is how the code looks like:

from sklearn.linear_model import Lasso lasso = Lasso() parameters = {"alpha":[1e-15, 1e-10, 1e-8, 1e-4, 1e-3, 1e-2, 1, 5, 10, 20]} lasso_regression = GridSearchCV(lasso, parameters, scoring='neg_mean_squared_error', cv=5) lasso_regression.fit(xs, y) print(lasso_regression.best_params_) print(lasso_regression.best_score_)

{‘alpha’: 1}
-3.041405896751369

Hope you now know how to implement Ridge and Lasso regression in machine learning with the Python programming language. In this case, the lasso is the best method of adjustment, with a regularization value of 1.

I hope you liked this article on how to implement the Ridge and Lasso algorithms in Machine Learning by using Python Programming Language. Feel free to ask your valuable questions in the comments section below.

Follow Us:

Default image
Aman Kharwal

I am a programmer from India, and I am here to guide you with Data Science, Machine Learning, Python, and C++ for free. I hope you will learn a lot in your journey towards Coding, Machine Learning and Artificial Intelligence with me.

Leave a Reply