Automate Machine Learning with H2O AutoML

Machine Learning and Artificial Intelligence are the most searched content on the Internet among the programmers coming from different programming languages. The popularity of Machine Learning has led to a lot of research that today we have even reached to the concept of AutoML, where we can automate machine learning tasks by automating some of the complex processes of Machine Learning.

Now we have some interfaces which can help to automate machine learning code that can make our task a little bit easy, but you still need to know about Data Science and Machine Learning to look at your task, whether it is going in a right way or not.

H2O AutoML

With the packages provided by AutoML to Automate Machine Learning code, one useful package is H2O AutoML, which will automate machine learning code by automating the whole process involved in model selection and hyperparameters tuning. In this article, we will look at how we can use H2O AutoML to Automate Machine Learning code.

Also, Read – Machine Learning Projects for Beginners.

Installing this package is as easy as installing all other packages in python. You just need to write – pip install h2o, in your terminal. If you use google colab you can install any package while writing the pip command in the cell itself using – !pip install h20.

Automate Machine Learning with H2O: Example

The dataset I will use for this task is based on the data of advertising, which consists of the sales of the Company as a dependent variable and it consists of features like Radio, Newspaper, and Television. You can download this dataset from here. Now let’s import the necessary libraries and have a look at the data:

import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.read_csv('Advertising.csv') df.head()
Image for post

I hope you have installed the h20 package successfully, now I will simply import the h2o package to automate our machine learning code:

import h2o h2o.init()
Checking whether there is an H2O instance running at http://localhost:54321 ..... not found.
Attempting to start a local H2O server...
  Java Version: openjdk version "11.0.8" 2020-07-14; OpenJDK Runtime Environment (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1); OpenJDK 64-Bit Server VM (build 11.0.8+10-post-Ubuntu-0ubuntu118.04.1, mixed mode, sharing)
  Starting server from /usr/local/lib/python3.6/dist-packages/h2o/backend/bin/h2o.jar
  Ice root: /tmp/tmp04nu4_h6
  JVM stdout: /tmp/tmp04nu4_h6/h2o_unknownUser_started_from_python.out
  JVM stderr: /tmp/tmp04nu4_h6/h2o_unknownUser_started_from_python.err
  Server is running at http://127.0.0.1:54321
Connecting to H2O server at http://127.0.0.1:54321 ... successful.
H2O_cluster_uptime:	02 secs
H2O_cluster_timezone:	Etc/UTC
H2O_data_parsing_timezone:	UTC
H2O_cluster_version:	3.30.0.7
H2O_cluster_version_age:	10 days
H2O_cluster_name:	H2O_from_python_unknownUser_vvwlgf
H2O_cluster_total_nodes:	1
H2O_cluster_free_memory:	3.180 Gb
H2O_cluster_total_cores:	2
H2O_cluster_allowed_cores:	2
H2O_cluster_status:	accepting new members, healthy
H2O_connection_url:	http://127.0.0.1:54321
H2O_connection_proxy:	{"http": null, "https": null}
H2O_internal_security:	False
H2O_API_Extensions:	Amazon S3, XGBoost, Algos, AutoML, Core V3, TargetEncoder, Core V4
Python_version:	3.6.9 final

Now, I will convert our dataset to an H2OFrame, which is like a pandas data frame but it has some more properties:

adver_df = h2o.H2OFrame(df) adver_df.describe()
Image for post

Now, I will split the above data into training set and text set:

train, test = adver_df.split_frame(ratios=[.50]) x = train.columns y = "Sales" x.remove(y)

Now, I will import the AutoML model provided by H2O to automate our machine learning task:

from h2o.automl import H2OAutoML aml = H2OAutoML(max_runtime_secs=600, seed=1, balance_classes=False, project_name='Advertising' ) %time aml.train(x=x, y=y, training_frame=train)

The above code will pass our data from various machine learning models, in the fixed time limit of 600 seconds. In these 600 seconds, our data will store the performance of all the models through which our AutoML model has passed through.

Now I will generate a leaderboard to see which machine learning model has performed the best among all.

lb = aml.leaderboard lb.head()
Image for post

Now, I will choose the best performing model, and find the best variable which is the most important one for our dependent variable:

se = aml.leader #Loading Stack Ensambled Metelearner model metalearner = h2o.get_model(se.metalearner()['name']) metalearner.varimp()
automate machine learning output

Now, let’s analyze our AutoML model:

model = h2o.get_model('DeepLearning_grid__1_AutoML_20200731_222821_model_1') model.model_performance(test)
Image for post output

Now, let’s have a look at the most important feature our model used for our dependent variable:

model.varimp_plot(num_of_features=3)
Image for post

Here we can clearly see that ‘TV’ is the most important feature in the predictions of Sales. Now, let’s visualize its dependence on Sales:

model.partial_plot(train, cols=["TV"], figsize=(5,5))
automate machine learning

I hope you liked this article on AutoML H2O to automate our machine learning code. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Follow Us:

Default image
Aman Kharwal

I am a programmer from India, and I am here to guide you with Data Science, Machine Learning, Python, and C++ for free. I hope you will learn a lot in your journey towards Coding, Machine Learning and Artificial Intelligence with me.

Leave a Reply