Automatic EDA using Python

By using Python libraries we save a lot of time, which is why Python is such a popular programming language for data science and machine learning. In this article, I’m going to introduce you to a tutorial on Automatic EDA using Python where we will understand all the information and statistics of the data in a few lines of code.

What is Automatic EDA?

The role of a Data Scientist begins with exploratory data analysis. It is the first and most important step in any data science task. The EDA helps a Data Scientist understand changes in data, from missing values to outliers.

Also, Read – 200+ Machine Learning Projects Solved and Explained.

So, Python libraries like Pandas, Matplotlib, Seaborn, and even Plotly are used for exploratory data analysis by most machine learning practitioners. But there is another library that can be used for EDA known as dataprep. It shows all the necessary statistics of the data and all the necessary information by interactive visualizations and summary statistics.

Automatic EDA using Python

In this section, I will take you through a Data Science tutorial on Automatic EDA using Python by using the dataprep library in Python. If you have never used it before you can easily install it by using the pip command; pip install dataprep. Now let’s get started by importing the necessary Python libraries and the dataset:

import pandas as pd
data = pd.read_csv('housing.csv')
data.head()
housing dataset

So like all Data Science tasks, I started this one too by looking at the first 5 rows of the data using the Pandas library in Python. So now we just need a single row code to show an interactive visualization of all columns and a statistics summary that will end the task of the automatic EDA using Python:

So, as you can see above, we’ve just seen an exploratory data analysis of the full dataset. If you want to see the EDA for a specific column only, just follow the step below where I explore the ‘median_house_value’ column in the dataset:

Summary

Now if you will click on the columns one by one you will be able to see different kinds of visualizations to explore the column.

I hope you liked this article on Automatic EDA using Python programming language. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1433

Leave a Reply