Web Scraping is one of the skills that every data science professional should know. Sometimes the data we need is available on a website in the form of a table which cannot be downloaded directly from the website. To use that data for any data science task, we need to collect it from the website using web scraping techniques. So if you want to learn how to scrape a table from a website, this article is for you. In this article, I will take you through a tutorial on how to scrape a table from a website using Python.
Scrape Table from a Website using Python
There are many Python libraries and modules that you can use for web scraping. To scrape a table from a website, I will use the urllib module in Python, which is already available in the Python standard library. So you don’t need to install any external library to scrape data from a website. Below is how you can use the urlib module to scrape a table from a website using Python programming language:
import urllib.request import pandas as pd url = "https://en.wikipedia.org/wiki/Programming_languages_used_in_most_popular_websites" with urllib.request.urlopen(url) as i: html = i.read() data = pd.read_html(html) print(data.head())
In the code above, I am collecting data from a table available on a webpage that contains a table describing the programming languages used in most popular companies. You can see the data we have received after web scraping is about the programming languages and databases being used by companies. So this is how you can scrape tables from any website using the Python programming language.
If you want to save this data in a CSV file, below is how you can save it:
After running the above code, you will see the CSV file saved on the same directory where your Python file is.
So this is how we can scrape tables from a website using Python. Web Scraping is one of the skills that every data science professional should know. I hope you liked this article on scraping tables from websites using Python. Feel free to ask valuable questions in the comments section below.