Matplotlib for Data Science

Matplotlib is a popular data visualization library for Python. It provides functions and tools enabling data scientists to create various plots and charts. Matplotlib allows data scientists to create high-quality visualizations that effectively communicate insights and patterns hidden within the data. If you want to learn Matplotlib for Data Science, this article is for you. In this article, I’ll take you through a complete guide to Matplotlib for Data Science.

What is Matplotlib?

The name “Matplotlib” is derived from “Matlab” (a numerical computing environment) and “plotting.” It was developed to mimic the plotting functionality of Matlab in Python. Matplotlib offers a flexible and intuitive interface, making it accessible to users of all skill levels. Whether you need to create simple line plots or complex heatmaps, Matplotlib provides the necessary tools to bring your data to life.

Data science professionals rely on Matplotlib because it offers a vast array of plot types, including line plots, scatter plots, bar charts, histograms, box plots, heat maps, and more. This versatility allows data scientists to visualize different types of data and choose the most appropriate plot for their analysis.

Matplotlib also integrates with other data science libraries such as NumPy, Pandas, and SciPy. This integration enables data scientists to effortlessly plot data, facilitating exploratory data analysis and visual representation of statistical analysis.

To install Matplotlib on your Python virtual environment, you can execute the command mentioned below in your terminal or command prompt:

  • pip install matplotlib

A Practical Guide to Matplotlib for Data Science

In this section, I will take you through a practical guide to Matplotlib for Data Science. Let’s start by creating a simple line plot:

# creating a line plot
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.show()
creating a line plot

The code above generates a line plot to visualize the relationship between two sets of data points. It’s a straightforward way to represent data and gain insights into trends or patterns.

Now let’s see how to add labels and titles to the plot:

# adding labels and title to the plot
plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('My First Plot')
plt.show()
matplotlib for data science: adding labels and titles to the plot

The code above enhances a line plot by adding labels to the x-axis and y-axis and including a title. These additions provide crucial information and context, making the plot more understandable and meaningful.

Now let’s see how to customize colours, styles, and markers:

# Customizing Colors, Styles, and Markers
plt.plot(x, y, color='red',
         linestyle='--',
         marker='o')
plt.show()
Customizing Colors, Styles, and Markers

Here, we specified the line’s colour using the ‘color’ parameter. We set the colour to ‘red’, which means the line will be displayed in a shade of red. Next, we define the line style using the ‘linestyle’ parameter. In this example, we set the line style to ‘–‘, representing a dashed line. Alternatively, you can choose other line styles such as solid (‘-‘), dotted (‘.’), or dash-dot (‘-.’). Finally, we set the marker style using the marker parameter. A marker is a symbol placed at each data point to make them more prominent. In this code, we set the marker to ‘o’, representing a circular marker. Other marker options include squares (‘s’), triangles (‘^’), or crosses (‘x’).

Now let’s see how to create subplots:

# subplots

plt.subplot(2, 2, 1)  # First subplot
plt.plot(x, y) # simple line plot

plt.subplot(2, 2, 2)  # Second subplot
plt.scatter(x, y) # scatter plot

categories = ['A', 'B', 'C', 'D']
values = [10, 15, 7, 12]

plt.subplot(2, 1, 2)  # Third subplot
plt.bar(categories, values) # bar plot

plt.show()
matplotlib for data science: subplots

To create subplots, we use the subplot() function in Matplotlib. This function takes three arguments: the number of rows, columns, and the index of the subplot we want to activate. In the first subplot, specified by subplot(2, 2, 1), we create a simple line plot using the plot() function. This subplot will be located in the top-left position of a 2×2 grid. Moving to the second subplot, specified by subplot(2, 2, 2), we create a scatter plot using the scatter() function. This subplot will be located in the top-right position of the grid. For the third subplot, specified by subplot(2, 1, 2), we create a bar plot using the bar() function. This subplot occupies the entire bottom row of the grid.

Now let’s see how to annotate a graph:

# Adding Annotations
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y)
plt.annotate('Highest Point', xy=(5, 10),
             xytext=(4, 8),
             arrowprops=dict(facecolor='black',
                             arrowstyle='->'))
plt.show()
Adding Annotations

Annotations are text or arrows that provide additional information about specific points or features in the plot. We add an annotation to the plot using the annotate() function. The annotate() function allows us to specify the text content of the annotation and its position. Here, we annotated the highest point on the plot with the text “Highest Point”. We defined the position of the annotation using the xy parameter, which takes the coordinates of the point we want to annotate. In this case, the highest point has the coordinates (5, 10). 

Additionally, we specified the position of the text using the xytext parameter. This parameter determines where the text annotation will be placed relative to the annotated point. In this example, we set the xytext parameter to (4, 8), which positions the text slightly to the left and above the highest point. To make the annotation more visually appealing, we added an arrow pointing from the annotated point to the text. It is done by setting the ‘arrowprops’ parameter, which takes a dictionary of properties for the arrow. In this code, we set the ‘facecolor’ property to ‘black’ to make the arrow black and the ‘arrowstyle’ property to ‘->’ to give it a pointed arrowhead.

Now let’s see how to add a legend to our graph:

# adding legend

x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()
matplotlib for data science: adding legends

A legend is a key that explains the meaning of the different elements (lines, markers, etc.) in the plot. The code above creates a plot with two lines, Line 1 and Line 2. It adds labels to each line and creates a legend to explain the meaning of the lines. The legend provides a visual key that helps viewers understand the data represented by each line.

Now let’s see how to resize a graph:

# Resizing a graph
plt.figure(figsize=(8, 6))
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()
Resizing a graph

Resizing a graph allows you to adjust its width and height, ensuring it fits well within your desired display or document. Here we specified the desired size for the graph using the figure(figsize=(8, 6)) function. In this code, we set the width to 8 inches and the height to 6 inches.

Now let’s see how to customize themes:

# customizing themes
plt.style.use('ggplot')
plt.figure(figsize=(8, 6))
x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]
plt.plot(x, y1, label='Line 1')
plt.plot(x, y2, label='Line 2')
plt.legend()
plt.show()
matplotlib for data science: customizing themes

Themes are pre-defined sets of styles and colours that give a distinct visual appearance to the plot. Here we set the theme for the plot using the style.use(‘ggplot’) function. In this code, we choose the ‘ggplot’ theme, which provides a specific aesthetic look and feel to the plot. Below are some other themes you can use to make your plot look beautiful:

  1. ‘classic’
  2. ‘Solarize_Light2’
  3. ‘fast’
  4. ‘tableau-colorblind10’
  5. ‘grayscale’
  6. ‘seaborn-bright’
  7. ‘seaborn-pastel’
  8. ‘dark_background’
  9. ‘seaborn-dark’
  10. ‘seaborn-white’
  11. ‘seaborn-ticks’
  12. ‘seaborn-whitegrid’

So these were some of the most important Matplotlib operations you should know while getting started with Matplotlib for Data Science. You can explore more Matplotlib operations and how to plot different types of graphs from the official documentation of Matplotlib here.

Also, read – Pandas Guide for Data Science.

Summary

The name “Matplotlib” is derived from “Matlab” (a numerical computing environment) and “plotting.” It was developed to mimic the plotting functionality of Matlab in Python. Matplotlib offers a flexible and intuitive interface, making it accessible to users of all skill levels. Whether you need to create simple line plots or complex heatmaps, Matplotlib provides the necessary tools to bring your data to life. I hope you liked this article on a practical guide to Matplotlib for Data Science. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

Leave a Reply