Convert Pandas DataFrame to NumPy Array

One of the tasks while solving a machine learning problem using Python is to convert a Pandas DataFrame to a NumPy array. A pandas DataFrame is a DataFrame object that we use to work with data, and a NumPy array is nothing but a data structure on which we perform mathematical operations. So before using a machine learning algorithm on our data, it’s important to convert the data into an array. So, in this article, I will take you through a tutorial on how to convert a pandas DataFrame to a NumPy array in Python.

Convert Pandas DataFrame to NumPy Array

To convert a pandas DataFrame into an array, we first need to have a dataset that we will store in a pandas DataFrame. So for this task, I will be using a stock price dataset of Apple. Let’s import both the NumPy and the pandas library in Python and the dataset to start with this task:

import numpy as np
import pandas as pd

data = pd.read_csv("AAPL.csv")
data = data.head()
         Date        Open        High         Low       Close   Adj Close     Volume
0  2021-09-15  148.559998  149.440002  146.369995  149.029999  148.812805   83281300
1  2021-09-16  148.440002  148.970001  147.220001  148.789993  148.573151   68034100
2  2021-09-17  148.820007  148.820007  145.759995  146.059998  145.847137  129868800
3  2021-09-20  143.800003  144.839996  141.270004  142.940002  142.731689  123478900        
4  2021-09-21  143.929993  144.600006  142.779999  143.429993  143.220963   75834000  

For a better understanding, I will be using the first five rows of the data:

data = data.head()

Now here’s how you can convert the DataFrame into an array:

np_array = np.array(data)
print(np_array)
[['2021-09-15' 148.559998 149.440002 146.369995 149.029999 148.812805
  83281300]
 ['2021-09-16' 148.440002 148.970001 147.220001 148.789993 148.573151
  68034100]
 ['2021-09-17' 148.820007 148.820007 145.759995 146.059998 145.847137
  129868800]
 ['2021-09-20' 143.800003 144.839996 141.270004 142.940002 142.731689
  123478900]
 ['2021-09-21' 143.929993 144.600006 142.779999 143.429993 143.220963
  75834000]]

If you want to select only some of the columns from a DataFrame to convert them into an array, then here’s how you can do it:

np_array = np.array(data[["Open", "Close"]])
print(np_array)
[[148.559998 149.029999]
 [148.440002 148.789993]
 [148.820007 146.059998]
 [143.800003 142.940002]
 [143.929993 143.429993]]

So this is how you can convert a pandas DataFrame into an array in Python.

Summary

A pandas DataFrame is a DataFrame object that we use to work with data, and a NumPy array is nothing but a data structure on which we perform mathematical operations. I hope you liked this article on converting pandas DataFrame to a NumPy array in Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

Leave a Reply