Machine Learning and Artificial Intelligence are the most searched content on the Internet among the programmers coming from different programming languages. The popularity of Machine Learning has led to a lot of research that today we have even reached to the concept of AutoML, where we can automate machine learning tasks by automating some of the complex processes of Machine Learning.
Now we have some interfaces which can help to automate machine learning code that can make our task a little bit easy, but you still need to know about Data Science and Machine Learning to look at your task, whether it is going in a right way or not.
H2O AutoML
With the packages provided by AutoML to Automate Machine Learning code, one useful package is H2O AutoML, which will automate machine learning code by automating the whole process involved in model selection and hyperparameters tuning. In this article, we will look at how we can use H2O AutoML to Automate Machine Learning code.
Also, Read – Machine Learning Projects for Beginners.
Installing this package is as easy as installing all other packages in python. You just need to write – pip install h2o, in your terminal. If you use google colab you can install any package while writing the pip command in the cell itself using – !pip install h20.
Automate Machine Learning with H2O: Example
The dataset I will use for this task is based on the data of advertising, which consists of the sales of the Company as a dependent variable and it consists of features like Radio, Newspaper, and Television. You can download this dataset from here. Now let’s import the necessary libraries and have a look at the data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('Advertising.csv')
df.head()
Code language: Python (python)

I hope you have installed the h20 package successfully, now I will simply import the h2o package to automate our machine learning code:
import h2o
h2o.init()
Code language: Python (python)
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found. Attempting to start a local H2O server... Java Version: openjdk version "11.0.8" 2020-07-14; OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1); OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing) Starting server from /usr/local/lib/python3.6/dist-packages/h2o/backend/bin/h2o.jar Ice root: /tmp/tmp04nu4_h6 JVM stdout: /tmp/tmp04nu4_h6/h2o_unknownUser_started_from_python.out JVM stderr: /tmp/tmp04nu4_h6/h2o_unknownUser_started_from_python.err Server is running at http://127.0.0.1:54321 Connecting to H2O server at http://127.0.0.1:54321 ... successful. H2O_cluster_uptime: 02 secs H2O_cluster_timezone: Etc/UTC H2O_data_parsing_timezone: UTC H2O_cluster_version: 3.30.0.7 H2O_cluster_version_age: 10 days H2O_cluster_name: H2O_from_python_unknownUser_vvwlgf H2O_cluster_total_nodes: 1 H2O_cluster_free_memory: 3.180 Gb H2O_cluster_total_cores: 2 H2O_cluster_allowed_cores: 2 H2O_cluster_status: accepting new members, healthy H2O_connection_url: http://127.0.0.1:54321 H2O_connection_proxy: {"http": null, "https": null} H2O_internal_security: False H2O_API_Extensions: Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4 Python_version: 3.6.9 final
Now, I will convert our dataset to an H2OFrame, which is like a pandas data frame but it has some more properties:
adver_df = h2o.H2OFrame(df)
adver_df.describe()
Code language: Python (python)

Now, I will split the above data into training set and text set:
train, test = adver_df.split_frame(ratios=[.50])
x = train.columns
y = "Sales"
x.remove(y)
Code language: Python (python)
Now, I will import the AutoML model provided by H2O to automate our machine learning task:
from h2o.automl import H2OAutoML
aml = H2OAutoML(max_runtime_secs=600,
seed=1,
balance_classes=False,
project_name='Advertising'
)
%time aml.train(x=x, y=y, training_frame=train)
Code language: Python (python)
The above code will pass our data from various machine learning models, in the fixed time limit of 600 seconds. In these 600 seconds, our data will store the performance of all the models through which our AutoML model has passed through.
Now I will generate a leaderboard to see which machine learning model has performed the best among all.
lb = aml.leaderboard
lb.head()
Code language: Python (python)

Now, I will choose the best performing model, and find the best variable which is the most important one for our dependent variable:
se = aml.leader
#Loading Stack Ensambled Metelearner model metalearner = h2o.get_model(se.metalearner()['name'])
metalearner.varimp()
Code language: Python (python)

Now, let’s analyze our AutoML model:
model = h2o.get_model('DeepLearning_grid__1_AutoML_20200731_222821_model_1')
model.model_performance(test)
Code language: Python (python)

Now, let’s have a look at the most important feature our model used for our dependent variable:
model.varimp_plot(num_of_features=3)
Code language: Python (python)

Here we can clearly see that ‘TV’ is the most important feature in the predictions of Sales. Now, let’s visualize its dependence on Sales:
model.partial_plot(train, cols=["TV"], figsize=(5,5))
Code language: Python (python)

I hope you liked this article on AutoML H2O to automate our machine learning code. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.