Web Scraping means to collect data from the Internet. As a beginner in data science, you must have seen CSV files on the Internet distributed by some popular websites like Kaggle and other govt websites. The data is prepared by either collecting and writing using standard methods or by scraping it from the Internet. In this article, I will take you through web scraping with Python using BeautifulSoup.
I will scrape data from Flipkart and create a CSV file from that data. It’s not that difficult what it seems. Let’s get our hands dirty with web scraping to create a CSV file using python. I will start by importing the necessary packages that we need for this task. So let’s get started.
Web Scraping to Create a CSV File
So we need two primary packages for this task, BeautifulSoup and urllib. We can easily install both these packages using the pip command – pip install bs4 and pip install urllib. After successfully installing these packages the next thing you need to do is importing these packages, so let’s import these and scrape the link we need to collect data from:
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url="https://www.flipkart.com/search?q=samsung+mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_HistoryAutoSuggest_0_2&otracker1=AS_QueryStore_HistoryAutoSuggest_0_2&as-pos=0&as-type=HISTORY&as-searchtext=sa"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
Code language: Python (python)
Now let’s see how many HTML containers are present in this link:
containers = page_soup.findAll("div", { "class": "_3O0U0u"})
print(len(containers))
Code language: Python (python)
24
print(soup.prettify(containers[0]))
Code language: Python (python)
<div class="_3O0U0u">
<div data-id="MOBFRZZHMHQVNDFA" style="width:100%">
<div class="_1UoZlX">
<a class="_31qSD5" href="/samsung-galaxy-m01-blue-32-gb/p/itmc068b26305a0d?pid=MOBFRZZHMHQVNDFA&amp;lid=LSTMOBFRZZHMHQVNDFAZXGBO6&amp;marketplace=FLIPKART&amp;srno=s_1_1&amp;otracker=AS_QueryStore_HistoryAutoSuggest_0_2&amp;otracker1=AS_QueryStore_HistoryAutoSuggest_0_2&amp;fm=organic&amp;iid=f9a57085-7ab9-4aba-b59d-5a4cbecd03e9.MOBFRZZHMHQVNDFA.SEARCH&amp;ssid=zu9bg122ao0000001596818422200&amp;qH=0258c7d48242959a" rel="noopener noreferrer" target="_blank">
<div class="_3SQWE6">
<div class="_1OCn9C">
<div>
<div class="_3BTv9X" style="height:200px;width:200px">
<img alt="Samsung Galaxy M01 (Blue, 32 GB)" class="_1Nyybr" src="//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg"/>
</div>
</div>
</div>
<div class="_2lesQu">
<div class="_1O_CiZ">
<span class="_1iHA1p">
<div class="_2kFyHg">
<label>
<input class="_3uUUD5" readonly="" type="checkbox"/>
<div class="_1p7h2j">
</div>
</label>
</div>
</span>
<label class="_10TB-Q">
<span>
Add to Compare
</span>
</label>
</div>
</div>
<div class="_3gDSOa _32A6AP">
<div class="DsQ2eg">
<svg class="_2oLiqr" height="16" viewbox="0 0 20 16" width="16" xmlns="http://www.w3.org/2000/svg">
<path class="_35Y7Yo" d="M8.695 16.682C4.06 12.382 1 9.536 1 6.065 1 3.219 3.178 1 5.95 1c1.566 0 3.069.746 4.05 1.915C10.981 1.745 12.484 1 14.05 1 16.822 1 19 3.22 19 6.065c0 3.471-3.06 6.316-7.695 10.617L10 17.897l-1.305-1.215z" fill="#2874F0" fill-rule="evenodd" opacity=".9" stroke="#FFF">
</path>
</svg>
</div>
</div>
</div>
<div class="_1-2Iqu row">
<div class="col col-7-12">
<div class="_3wU53n">
Samsung Galaxy M01 (Blue, 32 GB)
</div>
<div class="niH0FQ">
<span class="_2_KrJI" id="productRating_LSTMOBFRZZHMHQVNDFAZXGBO6_MOBFRZZHMHQVNDFA_">
<div class="hGSR34">
4.2
<img class="_2lQ_WZ" src="data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIxMyIgaGVpZ2h0PSIxMiI+PHBhdGggZmlsbD0iI0ZGRiIgZD0iTTYuNSA5LjQzOWwtMy42NzQgMi4yMy45NC00LjI2LTMuMjEtMi44ODMgNC4yNTQtLjQwNEw2LjUuMTEybDEuNjkgNC4wMSA0LjI1NC40MDQtMy4yMSAyLjg4Mi45NCA0LjI2eiIvPjwvc3ZnPg=="/>
</div>
</span>
<span class="_38sUEc">
<span>
<span>
5,040 Ratings
</span>
<span class="_1VpSqZ">
&amp;
</span>
<span>
371 Reviews
</span>
</span>
</span>
</div>
<div class="_3ULzGw">
<ul class="vFw0gD">
<li class="tVe95H">
3 GB RAM | 32 GB ROM | Expandable Upto 512 GB
</li>
<li class="tVe95H">
14.48 cm (5.7 inch) HD+ Display
</li>
<li class="tVe95H">
13MP + 2MP | 5MP Front Camera
</li>
<li class="tVe95H">
4000 mAh Lithium-ion Battery
</li>
<li class="tVe95H">
Qualcomm Snapdragon (SDM439) Octa Core Processor
</li>
<li class="tVe95H">
1 Year Manufacturer Warranty for Phone and 6 Months Warranty for in the Box Accessories
</li>
</ul>
</div>
</div>
<div class="col col-5-12 _2o7WAb">
<div class="_6BWGkk">
<div class="_1uv9Cb">
<div class="_1vC4OE _2rQ-NK">
₹9,899
</div>
</div>
</div>
<div class="_3n6o0t">
<img height="21" src="//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/fa_8b4b59.png"/>
</div>
</div>
</div>
</a>
</div>
</div>
</div>
Code language: HTML, XML (xml)
Now let’s see the first item present in the page:
container = containers[0]
print(container.div.img["alt"])
Code language: Python (python)
Samsung Galaxy M01 (Blue, 32 GB)
So we have Samsung Galaxy M01 smartphone with blue colour as the first item on the Flipkart webpage that we have scrapped. Now let’s have a look at the price of this smartphone:
price = container.findAll("div", {"class": "col col-5-12 _2o7WAb"})
print(price[0].text)
Code language: Python (python)
₹9,899
Now let’s have a look at its ratings from its customers:
ratings = container.findAll("div", {"class": "niH0FQ"})
print(ratings[0].text)
Code language: Python (python)
4.25,040 Ratings & 371 Reviews
Now let’s create a CSV file and store all the mobile phones with their name, price and ratings:
filename = "products.csv"
f = open(filename, "w")
headers = "Product_Name, Pricing, Ratings \n"
f.write(headers)
Code language: Python (python)
32
Now let’s have a look at what our CSV file has stored after the web scraping of Flipkart:
for container in containers:
product_name = container.div.img["alt"]
price_container = container.findAll("div", {"class": "col col-5-12 _2o7WAb"})
price = price_container[0].text.strip()
rating_container = container.findAll("div", {"class": "niH0FQ"})
rating = rating_container[0].text
print("Product_Name:"+ product_name)
print("Price: " + price)
print("Ratings:" + rating)
Code language: Python (python)
Product_Name:Samsung Galaxy M01 (Blue, 32 GB) Price: ₹9,899 Ratings:4.25,040 Ratings & 371 Reviews Product_Name:Samsung Galaxy A71 (Haze Crush Silver, 128 GB) Price: ₹30,999₹34,99911% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.41,311 Ratings & 140 Reviews Product_Name:Samsung Galaxy A31 (Prism Crush Black, 128 GB) Price: ₹20,999₹23,99912% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.32,905 Ratings & 272 Reviews Product_Name:Samsung Galaxy M11 (Violet, 32 GB) Price: ₹11,899₹12,6996% off Ratings:4.24,469 Ratings & 381 Reviews Product_Name:Samsung Galaxy A31 (Prism Crush Blue, 128 GB) Price: ₹20,999₹23,99912% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.32,905 Ratings & 272 Reviews Product_Name:Samsung Galaxy A31 (Prism Crush White, 128 GB) Price: ₹20,999₹23,99912% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.32,905 Ratings & 272 Reviews Product_Name:Samsung Galaxy A21s (White, 64 GB) Price: ₹15,999₹17,99911% offUpto ₹13,150 Off on Exchange Ratings:4.23,257 Ratings & 301 Reviews Product_Name:Samsung Galaxy A21s (Black, 64 GB) Price: ₹15,999₹17,99911% offUpto ₹13,150 Off on Exchange Ratings:4.23,257 Ratings & 301 Reviews Product_Name:Samsung Galaxy A21s (Blue, 64 GB) Price: ₹15,999₹17,99911% offUpto ₹13,150 Off on Exchange Ratings:4.23,257 Ratings & 301 Reviews Product_Name:Samsung Galaxy M11 (Violet, 64 GB) Price: ₹13,970₹14,2792% off Ratings:4.22,817 Ratings & 231 Reviews Product_Name:Samsung Galaxy M31 (Space Black, 64 GB) Price: ₹18,922No Cost EMI Ratings:4.4351 Ratings & 30 Reviews Product_Name:Samsung Galaxy M11 (Metallic Blue, 64 GB) Price: ₹13,988₹14,8505% off Ratings:4.22,817 Ratings & 231 Reviews Product_Name:Samsung Galaxy M11 (Metallic Blue, 32 GB) Price: ₹11,925 Ratings:4.24,469 Ratings & 381 Reviews Product_Name:Samsung Galaxy A71 (Prism Crush Black, 128 GB) Price: ₹30,999₹34,99911% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.41,311 Ratings & 140 Reviews Product_Name:Samsung Galaxy A51 (Prism Crush Black, 128 GB) Price: ₹23,999₹25,9997% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.44,117 Ratings & 470 Reviews Product_Name:Samsung Galaxy A51 (Prism Crush White, 128 GB) Price: ₹27,299₹28,9995% off Ratings:4.4517 Ratings & 33 Reviews Product_Name:Samsung Guru FM Plus SM-B110E/D Price: ₹1,662No Cost EMI Ratings:4.348,448 Ratings & 5,198 Reviews Product_Name:Samsung Galaxy A51 (Prism Crush Blue, 128 GB) Price: ₹23,999₹25,9997% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.44,117 Ratings & 470 Reviews Product_Name:Samsung Galaxy M01 (Black, 32 GB) Price: ₹9,779₹9,9992% off Ratings:4.25,040 Ratings & 371 Reviews Product_Name:Samsung Galaxy M11 (Black, 32 GB) Price: ₹11,865 Ratings:4.24,469 Ratings & 381 Reviews Product_Name:Samsung Galaxy M30S (Black, 128 GB) Price: ₹17,736₹17,9961% offNo Cost EMI Ratings:4.33,317 Ratings & 295 Reviews Product_Name:Samsung Galaxy M40 (Seawater Blue, 128 GB) Price: ₹21,490No Cost EMI Ratings:4.2691 Ratings & 50 Reviews Product_Name:Samsung Galaxy S10 Lite (Prism Blue, 128 GB) Price: ₹42,999₹43,9992% offNo Cost EMIUpto ₹13,150 Off on Exchange Ratings:4.59,675 Ratings & 2,410 Reviews Product_Name:Samsung Galaxy A21s (Blue, 64 GB) Price: ₹17,499₹19,99912% offUpto ₹13,150 Off on Exchange Ratings:4.31,268 Ratings & 118 Reviews
Also, Read – How to Start with Machine Learning?
I hope you liked this article on web scraping with Python to scrape Flipkart. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.
Also, Read – Learn to Evaluate a Machine Learning Model.
Could you please help further by creating a data frame for these products? And also, kindly create visualizations? thanks.
Hi, when you will run the code, it will automatically save a CSV file on your folder.
Hello, could you please help by sharing info about how to store those details scraped into csv
it will be saved into the same directory where your python file is
amazing work …kudos to your effort in sharing them
thanks
Dear Aman,
Firstly, thank you for providing such an immense materials for various Python and ML based projects.
I am very new to ML and trying to learn through your materials.
I am trying to execute this program of “Web Scraping to Create CSV”, but when i Run the code i am getting error as
“print(soup.prettify(containers[0]))
IndexError: list index out of range”
could you please help in rectifying this.
Thank you.
Check this notebook if you still get errors then contact me on LinkedIn
Thanks for this tutorial!
I had to search by inspecting to find the references of the up-to-date classes, because some have been modified since the tutorial was uploaded. It’s a perfect exercise to get straight into the bath, it’s perfect. thanks again!
Thanks for the feedback, the modules and libraries keep updating so whenever anyone get errors go through the official documentation to see the updates.