Categories
By Aman Kharwal

NumPy Tutorial for Data Science

NumPy Tutorial

Writing beautiful code with NumPy - Zach Bedell - Medium

This Tutorial will cover NumPy in detail. NumPy means Numerical Python, It provides an efficient interface to store and operate on dense data buffers.

In some ways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.

NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.

Lets Start with NumPy

import numpy as np
np.array([3.2,4,6,5])
#Output
array([3.2, 4. , 6. , 5. ])
np.array([1,4,2,5,3]) # integer array:
array([1, 4, 2, 5, 3])

Understanding Data Types in Python

np.array([1,2,3,4], dtype="str")
array(['1', '2', '3', '4'], dtype='<U1')
np.array([3,6,2,3], dtype="float32")
array([3., 6., 2., 3.], dtype=float32)
# nested lists result in multidimensional arrays
np.array([range(i,i+3) for i in [2,4,6]])
array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

Creating NumPy Arrays from Scratch

# Create a length-10 integer array filled with zeros
np.zeros(10, dtype="int") 
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
np.zeros((5,6), dtype="float")
array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])
# Create a 3x5 floating-point array filled with 1s
np.ones((3,5), dtype="float")
array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])
# Create a 3x5 array filled with 3.14
np.full((3,5), 3.14)
array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)

np.arange(0,20,2)
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])
# Create an array of five values evenly spaced between 0 and 1
np.linspace(0,1,5)
array([0.  , 0.25, 0.5 , 0.75, 1.  ])
# Create a 3x3 array of uniformly distributed
# random values between 0 and 1

np.random.random((3,3))
array([[0.29667042, 0.62754914, 0.15978968],
       [0.03361528, 0.8662517 , 0.03745184],
       [0.06928681, 0.51769191, 0.02500486]])
# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1

np.random.normal(0,1,(3,3))
array([[ 0.08187539,  0.23513661, -1.07537781],
       [-1.26478022, -1.79131146,  0.44600483],
       [ 1.20752837, -2.17000272,  0.40891878]])
# Create a 3x3 array of random integers in the interval [0, 10)

np.random.randint(0,10,(3,3))
array([[3, 4, 3],
       [4, 0, 4],
       [5, 8, 3]])
# Create a 3x3 identity matrix

np.eye(3)
array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

NumPy Standard Data Types

#Return a new array of given shape and type, with random values

np.empty((3,3),dtype="int")
array([[4607182418800017408,                   0,                   0],
       [                  0, 4607182418800017408,                   0],
       [                  0,                   0, 4607182418800017408]])
np.zeros(10,dtype="int16")
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)
#or using the associated NumPy object:

np.zeros(10,dtype=np.int16)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

The Basics of NumPy Arrays

#NumPy Array Attributes
#We'll use NumPy's random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run:

np.random.seed(0) # seed for reproducibility
x1 = np.random.randint(10, size=6) # One-dimensional array
#Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array):

np.random.seed(0) # seed for reproducibility
x1 = np.random.randint(10, size=6) #it's same ((np.random.randint((0,10), size=6))) # One-dimensional array
x2 = np.random.randint(10, size=(3,4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3,4,5)) # Three-dimensional array

print("x1 ndim: ",x1.ndim)
print("x1 shape: ",x1.shape)
print("x1 size: ",x1.size) #totaly,6 elements

print("x1 ndim: ",x2.ndim)
print("x1 shape: ",x2.shape)
print("x1 size: ",x2.size) #totaly,12 elements

print("x1 ndim: ",x3.ndim)
print("x1 shape: ",x3.shape)
print("x1 size: ",x3.size)#totaly,60 elements

print("dtype: ",x1.dtype) #the data type of the array
# Other attributes include itemsize, which lists the size (in bytes) of each array element,
# and nbytes, which lists the total size (in bytes) of the array:
print("itemsize:",x1.itemsize,"bytes")
print("nbytes:",x1.nbytes,"bytes")

print("dtype: ",x2.dtype) #the data type of the array
print("itemsize:",x2.itemsize,"bytes")
print("nbytes:",x2.nbytes,"bytes")

print("dtype: ",x3.dtype) #the data type of the array
print("itemsize:",x3.itemsize,"bytes")
print("nbytes:",x3.nbytes,"bytes") 

#In general, we expect that nbytes is equal to itemsize times size.
x1 ndim:  1
x1 shape:  (6,)
x1 size:  6
x1 ndim:  2
x1 shape:  (3, 4)
x1 size:  12
x1 ndim:  3
x1 shape:  (3, 4, 5)
x1 size:  60
dtype:  int64
itemsize: 8 bytes
nbytes: 48 bytes
dtype:  int64
itemsize: 8 bytes
nbytes: 96 bytes
dtype:  int64
itemsize: 8 bytes
nbytes: 480 bytes

Array Indexing: Accessing Single Elements

  • If you are familiar with Python’s standard list indexing, indexing in NumPy will feel quite familiar. In a one-dimensional array, you can access the 1st value (counting from zero) by specifying the desired index in square brackets, just as with Python lists:
x1
array([5, 0, 3, 3, 7, 9])
x1[0]
5
x1[4]
7
#To index from the end of the array, you can use negative indices:

x1[-1]
9

In a multidimensional array, you access items using a comma-separated tuple of indices:

x2

array([[3, 5, 2, 4],
       [7, 6, 8, 8],
       [1, 6, 7, 7]])
x2[2,1]
6

Reshaping of NumPy Arrays

Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape() method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following:

grid = np.arange(1,10,1).reshape(3,3)
print(grid)
[[1 2 3]
 [4 5 6]
 [7 8 9]]
x = np.array([1, 2, 3])
x.shape # x is a vector (3,)
(3,)
# row vector via reshape

x.reshape(1,3).shape
(1, 3)
# row vector via newaxis

x[np.newaxis, :].shape
(1, 3)
x.reshape(1,-1).shape
(1, 3)
# column vector via reshape

x.reshape((3, 1))
array([[1],
       [2],
       [3]])
# column vector via newaxis

x[:, np.newaxis]
array([[1],
       [2],
       [3]])

Computation on NumPy Arrays: Universal Functions

Exploring NumPy’s UFuncs

  • Ufuncs exist in two flavors: unary ufuncs, which operate on a single input, and binary ufuncs, which operate on two inputs. We’ll see examples of both these types of functions here.

Array arithmetic

# NumPy's ufuncs feel very natural to use because they make use of Python's native
# arithmetic operators. The standard addition, subtraction, multiplication, and division
# can all be used:

x = np.arange(4)
print("x =", x)
print("x + 5 =", x + 5)
print("x - 5 =", x - 5)
print("x * 2 =", x * 2)
print("x / 2 =", x / 2)
print("x // 2 =", x // 2) # floor division
x = [0 1 2 3]
x + 5 = [5 6 7 8]
x - 5 = [-5 -4 -3 -2]
x * 2 = [0 2 4 6]
x / 2 = [0.  0.5 1.  1.5]
x // 2 = [0 0 1 1]
#There is also a unary ufunc for negation, a ** operator for exponentiation, and a %
#operator for modulus:

print("-x = ", -x)
print("x ** 2 = ", x ** 2)
print("x % 2 = ", x % 2)
-x =  [ 0 -1 -2 -3]
x ** 2 =  [0 1 4 9]
x % 2 =  [0 1 0 1]
# In addition, these can be strung together however you wish, and the standard order
# of operations is respected:

-(0.5*x+1) ** 2
array([-1.  , -2.25, -4.  , -6.25])
# All of these arithmetic operations are simply convenient wrappers around specific
# functions built into NumPy; for example, the + operator is a wrapper for the add
# function:

print(np.add(3,2))

print(np.add(x,2)) #Addition +
print(np.subtract(x,5)) #Subtraction -
print(np.negative(x)) #Unary negation -
print(np.multiply(x,3)) #Multiplication *
print(np.divide(x,2)) #Division /
print(np.floor_divide(x,2)) #Floor division //
print(np.power(x,2)) #Exponentiation **
print(np.mod(x,2)) #Modulus/remainder **

print(np.multiply(x, x))
5
[2 3 4 5]
[-5 -4 -3 -2]
[ 0 -1 -2 -3]
[0 3 6 9]
[0.  0.5 1.  1.5]
[0 0 1 1]
[0 1 4 9]
[0 1 0 1]
[0 1 4 9]

Trigonometric functions

# NumPy provides a large number of useful ufuncs, and some of the most useful for the
# data scientist are the trigonometric functions. We'll start by defining an array of
# angles:

theta = np.linspace(0,np.pi,3)


#Now we can compute some trigonometric fuctions on these values:
print("theta      =",theta)
print("sin(theta) =",np.sin(theta))
print("cos(theta) =",np.cos(theta))
print("tan(theta) =",np.tan(theta))
theta      = [0.         1.57079633 3.14159265]
sin(theta) = [0.0000000e+00 1.0000000e+00 1.2246468e-16]
cos(theta) = [ 1.000000e+00  6.123234e-17 -1.000000e+00]
tan(theta) = [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]
x = [-1, 0, 1]

print("x = ", x)
print("arcsin(x) = ", np.arcsin(x))
print("arccos(x) = ", np.arccos(x))
print("arctan(x) = ", np.arctan(x))
x =  [-1, 0, 1]
arcsin(x) =  [-1.57079633  0.          1.57079633]
arccos(x) =  [3.14159265 1.57079633 0.        ]
arctan(x) =  [-0.78539816  0.          0.78539816]

Computation on Arrays: Broadcasting

  • Broadcasting is simply a set of rules for applying binary ufuncs (addition, subtraction, multiplication, etc.) on arrays of different sizes.

Introducing Broadcasting

import numpy as np

a = np.array([0,1,2])
b = np.array([5,5,5])
a+b
array([5, 6, 7])
# We can similarly extend this to arrays of higher dimension. Observe the result when
# we add a one-dimensional array to a two-dimensional array:

M = np.ones((3,3))
M
array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

Visualization of NumPy broadcasting

broadcasting

Rules of Broadcasting

  • Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:
  • Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
  • Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
  • Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

Broadcasting example

# Let's take a look at an example where both arrays need to be broadcast:
a = np.arange(3).reshape((3,1))
b = np.arange(3)
# Again, we'll start by writing out the shape of the arrays:

# a.shape = (3, 1)
# b.shape = (3,)
# |
# Rule 1 says we must pad the shape of b with ones:
# a.shape -> (3, 1)
# b.shape -> (1, 3)
# And rule 2 tells us that we upgrade each of these ones to match the corresponding
# size of the other array:
# a.shape -> (3, 3)
# b.shape -> (3, 3)
# Because the result matches, these shapes are compatible. We can see this here:
a+b
array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

Comparison Operators as ufuncs

  • The result of these comparison operators is always an array with a Boolean data type. All six of the standard comparison operations are available:
  • for example, you might wish to count all values greater than a certain value, or perhaps remove all outliers that are above some threshold. In NumPy, Boolean masking is often the most efficient way to accomplish these types of tasks.
x = np.array([1,2,3,4,5])

print(x<3)  # less than
print(x>3)  # greater than
print(x<=3) #less than or equal
print(x>=3) #greater than or equal
print(x!=3) #not equal
print(x==3) #equal
[ True  True False False False]
[False False False  True  True]
[ True  True  True False False]
[False False  True  True  True]
[ True  True False  True  True]
[False False  True False False]
rng = np.random.RandomState(seed=0)
x = rng.randint(10, size=(3,4))
print(x)

x<6
[[5 0 3 3]
 [7 9 3 5]
 [2 4 7 6]]
array([[ True,  True,  True,  True],
       [False, False,  True,  True],
       [ True,  True, False, False]])

1 reply on “NumPy Tutorial for Data Science”

Leave a Reply