One of the tasks while solving a machine learning problem using Python is to convert a Pandas DataFrame to a NumPy array. A pandas DataFrame is a DataFrame object that we use to work with data, and a NumPy array is nothing but a data structure on which we perform mathematical operations. So before using a machine learning algorithm on our data, it’s important to convert the data into an array. So, in this article, I will take you through a tutorial on how to convert a pandas DataFrame to a NumPy array in Python.
Convert Pandas DataFrame to NumPy Array
To convert a pandas DataFrame into an array, we first need to have a dataset that we will store in a pandas DataFrame. So for this task, I will be using a stock price dataset of Apple. Let’s import both the NumPy and the pandas library in Python and the dataset to start with this task:
import numpy as np import pandas as pd data = pd.read_csv("AAPL.csv") data = data.head()
Date Open High Low Close Adj Close Volume 0 2021-09-15 148.559998 149.440002 146.369995 149.029999 148.812805 83281300 1 2021-09-16 148.440002 148.970001 147.220001 148.789993 148.573151 68034100 2 2021-09-17 148.820007 148.820007 145.759995 146.059998 145.847137 129868800 3 2021-09-20 143.800003 144.839996 141.270004 142.940002 142.731689 123478900 4 2021-09-21 143.929993 144.600006 142.779999 143.429993 143.220963 75834000
For a better understanding, I will be using the first five rows of the data:
data = data.head()
Now here’s how you can convert the DataFrame into an array:
np_array = np.array(data) print(np_array)
[['2021-09-15' 148.559998 149.440002 146.369995 149.029999 148.812805 83281300] ['2021-09-16' 148.440002 148.970001 147.220001 148.789993 148.573151 68034100] ['2021-09-17' 148.820007 148.820007 145.759995 146.059998 145.847137 129868800] ['2021-09-20' 143.800003 144.839996 141.270004 142.940002 142.731689 123478900] ['2021-09-21' 143.929993 144.600006 142.779999 143.429993 143.220963 75834000]]
If you want to select only some of the columns from a DataFrame to convert them into an array, then here’s how you can do it:
np_array = np.array(data[["Open", "Close"]]) print(np_array)
[[148.559998 149.029999] [148.440002 148.789993] [148.820007 146.059998] [143.800003 142.940002] [143.929993 143.429993]]
So this is how you can convert a pandas DataFrame into an array in Python.
Summary
A pandas DataFrame is a DataFrame object that we use to work with data, and a NumPy array is nothing but a data structure on which we perform mathematical operations. I hope you liked this article on converting pandas DataFrame to a NumPy array in Python. Feel free to ask valuable questions in the comments section below.