Web Scraping to Create CSV

Web Scraping means to collect data from the Internet. As a beginner in data science, you must have seen CSV files on the Internet distributed by some popular websites like Kaggle and other govt websites. The data is prepared by either collecting and writing using standard methods or by scraping it from the Internet. In this article, I will take you through web scraping with Python using BeautifulSoup.

I will scrape data from Flipkart and create a CSV file from that data. It’s not that difficult what it seems. Let’s get our hands dirty with web scraping to create a CSV file using python. I will start by importing the necessary packages that we need for this task. So let’s get started.

Web Scraping to Create a CSV File

So we need two primary packages for this task, BeautifulSoup and urllib. We can easily install both these packages using the pip command – pip install bs4 and pip install urllib. After successfully installing these packages the next thing you need to do is importing these packages, so let’s import these and scrape the link we need to collect data from:

from bs4 import BeautifulSoup as soup from urllib.request import urlopen as uReq my_url="https://www.flipkart.com/search?q=samsung+mobiles&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_HistoryAutoSuggest_0_2&otracker1=AS_QueryStore_HistoryAutoSuggest_0_2&as-pos=0&as-type=HISTORY&as-searchtext=sa" uClient = uReq(my_url) page_html = uClient.read() uClient.close() page_soup = soup(page_html, "html.parser")

Now let’s see how many HTML containers are present in this link:

containers = page_soup.findAll("div", { "class": "_3O0U0u"}) print(len(containers))

24

print(soup.prettify(containers[0]))
<div class="_3O0U0u"> <div data-id="MOBFRZZHMHQVNDFA" style="width:100%"> <div class="_1UoZlX"> <a class="_31qSD5" href="/samsung-galaxy-m01-blue-32-gb/p/itmc068b26305a0d?pid=MOBFRZZHMHQVNDFA&lid=LSTMOBFRZZHMHQVNDFAZXGBO6&marketplace=FLIPKART&srno=s_1_1&otracker=AS_QueryStore_HistoryAutoSuggest_0_2&otracker1=AS_QueryStore_HistoryAutoSuggest_0_2&fm=organic&iid=f9a57085-7ab9-4aba-b59d-5a4cbecd03e9.MOBFRZZHMHQVNDFA.SEARCH&ssid=zu9bg122ao0000001596818422200&qH=0258c7d48242959a" rel="noopener noreferrer" target="_blank"> <div class="_3SQWE6"> <div class="_1OCn9C"> <div> <div class="_3BTv9X" style="height:200px;width:200px"> <img alt="Samsung Galaxy M01 (Blue, 32 GB)" class="_1Nyybr" src="//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/placeholder_9951d0.svg"/> </div> </div> </div> <div class="_2lesQu"> <div class="_1O_CiZ"> <span class="_1iHA1p"> <div class="_2kFyHg"> <label> <input class="_3uUUD5" readonly="" type="checkbox"/> <div class="_1p7h2j"> </div> </label> </div> </span> <label class="_10TB-Q"> <span> Add to Compare </span> </label> </div> </div> <div class="_3gDSOa _32A6AP"> <div class="DsQ2eg"> <svg class="_2oLiqr" height="16" viewbox="0 0 20 16" width="16" xmlns="http://www.w3.org/2000/svg"> <path class="_35Y7Yo" d="M8.695 16.682C4.06 12.382 1 9.536 1 6.065 1 3.219 3.178 1 5.95 1c1.566 0 3.069.746 4.05 1.915C10.981 1.745 12.484 1 14.05 1 16.822 1 19 3.22 19 6.065c0 3.471-3.06 6.316-7.695 10.617L10 17.897l-1.305-1.215z" fill="#2874F0" fill-rule="evenodd" opacity=".9" stroke="#FFF"> </path> </svg> </div> </div> </div> <div class="_1-2Iqu row"> <div class="col col-7-12"> <div class="_3wU53n"> Samsung Galaxy M01 (Blue, 32 GB) </div> <div class="niH0FQ"> <span class="_2_KrJI" id="productRating_LSTMOBFRZZHMHQVNDFAZXGBO6_MOBFRZZHMHQVNDFA_"> <div class="hGSR34"> 4.2 <img class="_2lQ_WZ" src=""/> </div> </span> <span class="_38sUEc"> <span> <span> 5,040 Ratings </span> <span class="_1VpSqZ"> & </span> <span> 371 Reviews </span> </span> </span> </div> <div class="_3ULzGw"> <ul class="vFw0gD"> <li class="tVe95H"> 3 GB RAM | 32 GB ROM | Expandable Upto 512 GB </li> <li class="tVe95H"> 14.48 cm (5.7 inch) HD+ Display </li> <li class="tVe95H"> 13MP + 2MP | 5MP Front Camera </li> <li class="tVe95H"> 4000 mAh Lithium-ion Battery </li> <li class="tVe95H"> Qualcomm Snapdragon (SDM439) Octa Core Processor </li> <li class="tVe95H"> 1 Year Manufacturer Warranty for Phone and 6 Months Warranty for in the Box Accessories </li> </ul> </div> </div> <div class="col col-5-12 _2o7WAb"> <div class="_6BWGkk"> <div class="_1uv9Cb"> <div class="_1vC4OE _2rQ-NK"> ₹9,899 </div> </div> </div> <div class="_3n6o0t"> <img height="21" src="//img1a.flixcart.com/www/linchpin/fk-cp-zion/img/fa_8b4b59.png"/> </div> </div> </div> </a> </div> </div> </div>

Now let’s see the first item present in the page:

container = containers[0] print(container.div.img["alt"])

Samsung Galaxy M01 (Blue, 32 GB)

So we have Samsung Galaxy M01 smartphone with blue colour as the first item on the Flipkart webpage that we have scrapped. Now let’s have a look at the price of this smartphone:

price = container.findAll("div", {"class": "col col-5-12 _2o7WAb"}) print(price[0].text)

₹9,899

Now let’s have a look at its ratings from its customers:

ratings = container.findAll("div", {"class": "niH0FQ"}) print(ratings[0].text)

4.25,040 Ratings & 371 Reviews

Now let’s create a CSV file and store all the mobile phones with their name, price and ratings:

filename = "products.csv" f = open(filename, "w") headers = "Product_Name, Pricing, Ratings \n" f.write(headers)

32

Now let’s have a look at what our CSV file has stored after the web scraping of Flipkart:

for container in containers: product_name = container.div.img["alt"] price_container = container.findAll("div", {"class": "col col-5-12 _2o7WAb"}) price = price_container[0].text.strip() rating_container = container.findAll("div", {"class": "niH0FQ"}) rating = rating_container[0].text print("Product_Name:"+ product_name) print("Price: " + price) print("Ratings:" + rating)
Product_Name:Samsung Galaxy M01 (Blue, 32 GB)
Price: ₹9,899
Ratings:4.25,040 Ratings & 371 Reviews
Product_Name:Samsung Galaxy A71 (Haze Crush Silver, 128 GB)
Price: ₹30,999₹34,99911% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.41,311 Ratings & 140 Reviews
Product_Name:Samsung Galaxy A31 (Prism Crush Black, 128 GB)
Price: ₹20,999₹23,99912% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.32,905 Ratings & 272 Reviews
Product_Name:Samsung Galaxy M11 (Violet, 32 GB)
Price: ₹11,899₹12,6996% off
Ratings:4.24,469 Ratings & 381 Reviews
Product_Name:Samsung Galaxy A31 (Prism Crush Blue, 128 GB)
Price: ₹20,999₹23,99912% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.32,905 Ratings & 272 Reviews
Product_Name:Samsung Galaxy A31 (Prism Crush White, 128 GB)
Price: ₹20,999₹23,99912% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.32,905 Ratings & 272 Reviews
Product_Name:Samsung Galaxy A21s (White, 64 GB)
Price: ₹15,999₹17,99911% offUpto ₹13,150 Off on Exchange
Ratings:4.23,257 Ratings & 301 Reviews
Product_Name:Samsung Galaxy A21s (Black, 64 GB)
Price: ₹15,999₹17,99911% offUpto ₹13,150 Off on Exchange
Ratings:4.23,257 Ratings & 301 Reviews
Product_Name:Samsung Galaxy A21s (Blue, 64 GB)
Price: ₹15,999₹17,99911% offUpto ₹13,150 Off on Exchange
Ratings:4.23,257 Ratings & 301 Reviews
Product_Name:Samsung Galaxy M11 (Violet, 64 GB)
Price: ₹13,970₹14,2792% off
Ratings:4.22,817 Ratings & 231 Reviews
Product_Name:Samsung Galaxy M31 (Space Black, 64 GB)
Price: ₹18,922No Cost EMI
Ratings:4.4351 Ratings & 30 Reviews
Product_Name:Samsung Galaxy M11 (Metallic Blue, 64 GB)
Price: ₹13,988₹14,8505% off
Ratings:4.22,817 Ratings & 231 Reviews
Product_Name:Samsung Galaxy M11 (Metallic Blue, 32 GB)
Price: ₹11,925
Ratings:4.24,469 Ratings & 381 Reviews
Product_Name:Samsung Galaxy A71 (Prism Crush Black, 128 GB)
Price: ₹30,999₹34,99911% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.41,311 Ratings & 140 Reviews
Product_Name:Samsung Galaxy A51 (Prism Crush Black, 128 GB)
Price: ₹23,999₹25,9997% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.44,117 Ratings & 470 Reviews
Product_Name:Samsung Galaxy A51 (Prism Crush White, 128 GB)
Price: ₹27,299₹28,9995% off
Ratings:4.4517 Ratings & 33 Reviews
Product_Name:Samsung Guru FM Plus SM-B110E/D
Price: ₹1,662No Cost EMI
Ratings:4.348,448 Ratings & 5,198 Reviews
Product_Name:Samsung Galaxy A51 (Prism Crush Blue, 128 GB)
Price: ₹23,999₹25,9997% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.44,117 Ratings & 470 Reviews
Product_Name:Samsung Galaxy M01 (Black, 32 GB)
Price: ₹9,779₹9,9992% off
Ratings:4.25,040 Ratings & 371 Reviews
Product_Name:Samsung Galaxy M11 (Black, 32 GB)
Price: ₹11,865
Ratings:4.24,469 Ratings & 381 Reviews
Product_Name:Samsung Galaxy M30S (Black, 128 GB)
Price: ₹17,736₹17,9961% offNo Cost EMI
Ratings:4.33,317 Ratings & 295 Reviews
Product_Name:Samsung Galaxy M40 (Seawater Blue, 128 GB)
Price: ₹21,490No Cost EMI
Ratings:4.2691 Ratings & 50 Reviews
Product_Name:Samsung Galaxy S10 Lite (Prism Blue, 128 GB)
Price: ₹42,999₹43,9992% offNo Cost EMIUpto ₹13,150 Off on Exchange
Ratings:4.59,675 Ratings & 2,410 Reviews
Product_Name:Samsung Galaxy A21s (Blue, 64 GB)
Price: ₹17,499₹19,99912% offUpto ₹13,150 Off on Exchange
Ratings:4.31,268 Ratings & 118 Reviews

Also, Read – How to Start with Machine Learning?

I hope you liked this article on web scraping with Python to scrape Flipkart. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Also, Read – Learn to Evaluate a Machine Learning Model.

Follow Us:

2 Comments

  1. Could you please help further by creating a data frame for these products? And also, kindly create visualizations? thanks.

Leave a Reply