PyCaret in Machine Learning

PyCaret is an open-source machine learning library that helps automate the entire process of training a machine learning model. From model selection to training and testing, PyCaret is a great tool that can be used in machine learning. In this article, I will introduce you to a machine learning tutorial on PyCaret using Python.


PyCaret is an open-source machine learning library that automates the entire process of training a machine learning model. When using it, you just need to have an idea of the best features you need to train your machine learning model, then you can use PyCaret from model selection to training and testing. Simply put, it automates the entire machine learning process, from choosing which model to select to training and testing your model.

The best feature of PyCaret is that it helps you know which is the best machine learning model that you should use on a particular dataset. It simply shows you the best performing models by ranking the models based on the performance measurement metrics of machine learning models. The best part about this feature is that it does everything with a few lines of code.

So even if you don’t like using shortcuts while training a machine learning model, you can still use it to select which model is best for your dataset. If you have never used it before you can easily install it by using the pip command; pip install pycaret. In the section below, I will take you through a machine learning tutorial on PyCaret using Python.

PyCaret using Python

I hope you now have understood what is PyCaret and why it is used in machine learning. Now let’s see how to implement it using Python to automate model selection and model training. For this task, I will be using the famous Titanic dataset to predict the titanic survival using PyCaret and the Python programming language. So let’s start by importing the dataset:

import numpy as np
import pandas as pd
data = pd.read_csv("train.csv")
titanic dataset

Now let’s set up the model. As this is the problem of classification so I will set up the model for classification. While setting up this model we need to declare the data and the target labels. We also need to declare the features that need to be ignored while training the model. Below is how to set up the PyCaret model for classification: 

from pycaret.classification import *
clf = setup(data, target = "Survived",
            ignore_features=["Ticket", "Name", "PassengerId"], 
            silent = True, session_id = 786)

Now I am going to use the most important feature provided by this library which compares models. In machine learning, this is called model selection. If you don’t understand much about model selection, you can use this feature for model selection. Here’s how to compare machine learning models using PyCaret:

PyCaret: Model selection

So according to the above output the Light Gradient Boosting model is the best model that can be used in the Titanic dataset. So let’s initialize the LightGBM model and make predictions on the test set:

lightgbm = create_model('lightgbm')
test_data = pd.read_csv('test.csv')
predict = predict_model(lightgbm, data=test_data)
PyCaret Tutorial for machine learning


PyCaret is a great machine learning library to automate the complete process of training a machine learning model as it helps in from model selection to training and testing. You can use it for at least model selection if you don’t like shortcuts while training machine learning models. 

I hope you liked this article on a machine learning tutorial on PyCaret using the Python programming language. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1538


      • Good afternoon sir what my doubt is we use compare_models()only when we use Pycaret library or without Pycaret libaray and
        In this article u explained Pycaret only for Classification problems.would u please expain How to use Pycaret for Regression,&clustering problems also

      • Yes, we can use PyCaret for regression, and many more problems also, I will soon share about it, till then you can explore it from the official documentation of Pycaret.

Leave a Reply