
This Tutorial will cover NumPy in detail. NumPy means Numerical Python, It provides an efficient interface to store and operate on dense data buffers.
In some ways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.
NumPy arrays form the core of nearly the entire ecosystem of data science tools in Python, so time spent learning to use NumPy effectively will be valuable no matter what aspect of data science interests you.
Lets Start with NumPy
import numpy as np np.array([3.2,4,6,5])
#Output array([3.2, 4. , 6. , 5. ])
np.array([1,4,2,5,3]) # integer array:
array([1, 4, 2, 5, 3])
Understanding Data Types in Python
np.array([1,2,3,4], dtype="str")
array(['1', '2', '3', '4'], dtype='<U1')
np.array([3,6,2,3], dtype="float32")
array([3., 6., 2., 3.], dtype=float32)
# nested lists result in multidimensional arrays np.array([range(i,i+3) for i in [2,4,6]])
array([[2, 3, 4], [4, 5, 6], [6, 7, 8]])
Creating NumPy Arrays from Scratch
# Create a length-10 integer array filled with zeros np.zeros(10, dtype="int")
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
np.zeros((5,6), dtype="float")
array([[0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.]])
# Create a 3x5 floating-point array filled with 1s np.ones((3,5), dtype="float")
array([[1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])
# Create a 3x5 array filled with 3.14 np.full((3,5), 3.14)
array([[3.14, 3.14, 3.14, 3.14, 3.14], [3.14, 3.14, 3.14, 3.14, 3.14], [3.14, 3.14, 3.14, 3.14, 3.14]])
# Create an array filled with a linear sequence # Starting at 0, ending at 20, stepping by 2 # (this is similar to the built-in range() function) np.arange(0,20,2)
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
# Create an array of five values evenly spaced between 0 and 1 np.linspace(0,1,5)
array([0. , 0.25, 0.5 , 0.75, 1. ])
# Create a 3x3 array of uniformly distributed # random values between 0 and 1 np.random.random((3,3))
array([[0.29667042, 0.62754914, 0.15978968], [0.03361528, 0.8662517 , 0.03745184], [0.06928681, 0.51769191, 0.02500486]])
# Create a 3x3 array of normally distributed random values # with mean 0 and standard deviation 1 np.random.normal(0,1,(3,3))
array([[ 0.08187539, 0.23513661, -1.07537781], [-1.26478022, -1.79131146, 0.44600483], [ 1.20752837, -2.17000272, 0.40891878]])
# Create a 3x3 array of random integers in the interval [0, 10) np.random.randint(0,10,(3,3))
array([[3, 4, 3], [4, 0, 4], [5, 8, 3]])
# Create a 3x3 identity matrix np.eye(3)
array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
NumPy Standard Data Types
#Return a new array of given shape and type, with random values np.empty((3,3),dtype="int")
array([[4607182418800017408, 0, 0], [ 0, 4607182418800017408, 0], [ 0, 0, 4607182418800017408]])
np.zeros(10,dtype="int16")
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)
#or using the associated NumPy object: np.zeros(10,dtype=np.int16)
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)
The Basics of NumPy Arrays
#NumPy Array Attributes #We'll use NumPy's random number generator, which we will seed with a set value in order to ensure that the same random arrays are generated each time this code is run: np.random.seed(0) # seed for reproducibility x1 = np.random.randint(10, size=6) # One-dimensional array
#Each array has attributes ndim (the number of dimensions), shape (the size of each dimension), and size (the total size of the array): np.random.seed(0) # seed for reproducibility x1 = np.random.randint(10, size=6) #it's same ((np.random.randint((0,10), size=6))) # One-dimensional array x2 = np.random.randint(10, size=(3,4)) # Two-dimensional array x3 = np.random.randint(10, size=(3,4,5)) # Three-dimensional array print("x1 ndim: ",x1.ndim) print("x1 shape: ",x1.shape) print("x1 size: ",x1.size) #totaly,6 elements print("x1 ndim: ",x2.ndim) print("x1 shape: ",x2.shape) print("x1 size: ",x2.size) #totaly,12 elements print("x1 ndim: ",x3.ndim) print("x1 shape: ",x3.shape) print("x1 size: ",x3.size)#totaly,60 elements print("dtype: ",x1.dtype) #the data type of the array # Other attributes include itemsize, which lists the size (in bytes) of each array element, # and nbytes, which lists the total size (in bytes) of the array: print("itemsize:",x1.itemsize,"bytes") print("nbytes:",x1.nbytes,"bytes") print("dtype: ",x2.dtype) #the data type of the array print("itemsize:",x2.itemsize,"bytes") print("nbytes:",x2.nbytes,"bytes") print("dtype: ",x3.dtype) #the data type of the array print("itemsize:",x3.itemsize,"bytes") print("nbytes:",x3.nbytes,"bytes") #In general, we expect that nbytes is equal to itemsize times size.
x1 ndim: 1 x1 shape: (6,) x1 size: 6 x1 ndim: 2 x1 shape: (3, 4) x1 size: 12 x1 ndim: 3 x1 shape: (3, 4, 5) x1 size: 60 dtype: int64 itemsize: 8 bytes nbytes: 48 bytes dtype: int64 itemsize: 8 bytes nbytes: 96 bytes dtype: int64 itemsize: 8 bytes nbytes: 480 bytes
Array Indexing: Accessing Single Elements
- If you are familiar with Python’s standard list indexing, indexing in NumPy will feel quite familiar. In a one-dimensional array, you can access the 1st value (counting from zero) by specifying the desired index in square brackets, just as with Python lists:
x1
array([5, 0, 3, 3, 7, 9])
x1[0]
5
x1[4]
7
#To index from the end of the array, you can use negative indices: x1[-1]
9
In a multidimensional array, you access items using a comma-separated tuple of indices:
x2 array([[3, 5, 2, 4], [7, 6, 8, 8], [1, 6, 7, 7]])
x2[2,1]
6
Reshaping of NumPy Arrays
Another useful type of operation is reshaping of arrays. The most flexible way of doing this is with the reshape() method. For example, if you want to put the numbers 1 through 9 in a 3×3 grid, you can do the following:
grid = np.arange(1,10,1).reshape(3,3) print(grid)
[[1 2 3] [4 5 6] [7 8 9]]
x = np.array([1, 2, 3]) x.shape # x is a vector (3,)
(3,)
# row vector via reshape x.reshape(1,3).shape
(1, 3)
# row vector via newaxis x[np.newaxis, :].shape
(1, 3)
x.reshape(1,-1).shape
(1, 3)
# column vector via reshape x.reshape((3, 1))
array([[1], [2], [3]])
# column vector via newaxis x[:, np.newaxis]
array([[1], [2], [3]])
Computation on NumPy Arrays: Universal Functions
Exploring NumPy’s UFuncs
- Ufuncs exist in two flavors: unary ufuncs, which operate on a single input, and binary ufuncs, which operate on two inputs. We’ll see examples of both these types of functions here.
Array arithmetic
# NumPy's ufuncs feel very natural to use because they make use of Python's native # arithmetic operators. The standard addition, subtraction, multiplication, and division # can all be used: x = np.arange(4) print("x =", x) print("x + 5 =", x + 5) print("x - 5 =", x - 5) print("x * 2 =", x * 2) print("x / 2 =", x / 2) print("x // 2 =", x // 2) # floor division
x = [0 1 2 3] x + 5 = [5 6 7 8] x - 5 = [-5 -4 -3 -2] x * 2 = [0 2 4 6] x / 2 = [0. 0.5 1. 1.5] x // 2 = [0 0 1 1]
#There is also a unary ufunc for negation, a ** operator for exponentiation, and a % #operator for modulus: print("-x = ", -x) print("x ** 2 = ", x ** 2) print("x % 2 = ", x % 2)
-x = [ 0 -1 -2 -3] x ** 2 = [0 1 4 9] x % 2 = [0 1 0 1]
# In addition, these can be strung together however you wish, and the standard order # of operations is respected: -(0.5*x+1) ** 2
array([-1. , -2.25, -4. , -6.25])
# All of these arithmetic operations are simply convenient wrappers around specific # functions built into NumPy; for example, the + operator is a wrapper for the add # function: print(np.add(3,2)) print(np.add(x,2)) #Addition + print(np.subtract(x,5)) #Subtraction - print(np.negative(x)) #Unary negation - print(np.multiply(x,3)) #Multiplication * print(np.divide(x,2)) #Division / print(np.floor_divide(x,2)) #Floor division // print(np.power(x,2)) #Exponentiation ** print(np.mod(x,2)) #Modulus/remainder ** print(np.multiply(x, x))
5 [2 3 4 5] [-5 -4 -3 -2] [ 0 -1 -2 -3] [0 3 6 9] [0. 0.5 1. 1.5] [0 0 1 1] [0 1 4 9] [0 1 0 1] [0 1 4 9]
Trigonometric functions
# NumPy provides a large number of useful ufuncs, and some of the most useful for the # data scientist are the trigonometric functions. We'll start by defining an array of # angles: theta = np.linspace(0,np.pi,3) #Now we can compute some trigonometric fuctions on these values: print("theta =",theta) print("sin(theta) =",np.sin(theta)) print("cos(theta) =",np.cos(theta)) print("tan(theta) =",np.tan(theta))
theta = [0. 1.57079633 3.14159265] sin(theta) = [0.0000000e+00 1.0000000e+00 1.2246468e-16] cos(theta) = [ 1.000000e+00 6.123234e-17 -1.000000e+00] tan(theta) = [ 0.00000000e+00 1.63312394e+16 -1.22464680e-16]
x = [-1, 0, 1] print("x = ", x) print("arcsin(x) = ", np.arcsin(x)) print("arccos(x) = ", np.arccos(x)) print("arctan(x) = ", np.arctan(x))
x = [-1, 0, 1] arcsin(x) = [-1.57079633 0. 1.57079633] arccos(x) = [3.14159265 1.57079633 0. ] arctan(x) = [-0.78539816 0. 0.78539816]
Computation on Arrays: Broadcasting
- Broadcasting is simply a set of rules for applying binary ufuncs (addition, subtraction, multiplication, etc.) on arrays of different sizes.
Introducing Broadcasting
import numpy as np a = np.array([0,1,2]) b = np.array([5,5,5]) a+b
array([5, 6, 7])
# We can similarly extend this to arrays of higher dimension. Observe the result when # we add a one-dimensional array to a two-dimensional array: M = np.ones((3,3)) M
array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]])
Visualization of NumPy broadcasting

Rules of Broadcasting
- Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:
- Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side.
- Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.
- Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.
Broadcasting example
# Let's take a look at an example where both arrays need to be broadcast: a = np.arange(3).reshape((3,1)) b = np.arange(3) # Again, we'll start by writing out the shape of the arrays: # a.shape = (3, 1) # b.shape = (3,) # | # Rule 1 says we must pad the shape of b with ones: # a.shape -> (3, 1) # b.shape -> (1, 3) # And rule 2 tells us that we upgrade each of these ones to match the corresponding # size of the other array: # a.shape -> (3, 3) # b.shape -> (3, 3) # Because the result matches, these shapes are compatible. We can see this here: a+b
array([[0, 1, 2], [1, 2, 3], [2, 3, 4]])
Comparison Operators as ufuncs
- The result of these comparison operators is always an array with a Boolean data type. All six of the standard comparison operations are available:
- for example, you might wish to count all values greater than a certain value, or perhaps remove all outliers that are above some threshold. In NumPy, Boolean masking is often the most efficient way to accomplish these types of tasks.
x = np.array([1,2,3,4,5]) print(x<3) # less than print(x>3) # greater than print(x<=3) #less than or equal print(x>=3) #greater than or equal print(x!=3) #not equal print(x==3) #equal
[ True True False False False] [False False False True True] [ True True True False False] [False False True True True] [ True True False True True] [False False True False False]
rng = np.random.RandomState(seed=0) x = rng.randint(10, size=(3,4)) print(x) x<6
[[5 0 3 3] [7 9 3 5] [2 4 7 6]] array([[ True, True, True, True], [False, False, True, True], [ True, True, False, False]])
[…] will face difficulties with data science. You can start learning Mathematics for Data Science using Numerical Python and […]
thank u sir 🙂
Keep visiting
sir from where we can practice numpy and pandas?
The official documentation of both these libraries will help you, just search about their documentations on Google.