Useful Python Scripts

There are some pieces of scripts or code that we generally use in the middle of several tasks daily. For example, whenever I work on a sentiment analysis task, I always use one common script to clean the text column properly. It doesn’t mean that you should never use a different logic on a new task, you can just prefer such scripts while working on the same kind of problem where you used the script earlier and got desired results. So if you are looking for some of the Python scripts that you can use in the middle of solving real-time problems, then this article is for you. In this article, I will take you through five useful Python scripts that you can use to solve real-time problems in your projects.

5 Useful Python Scripts

Below are all the useful Python scripts that you will learn about in this article:

  1. removing duplicates 
  2. text cleaning
  3. web scraping
  4. converting image to an array
  5. annotating graphs

Now let’s go through all these useful Python scripts one by one.

Removing Duplicates

Suppose you have a list of names that contain duplicate names. In most problems, you should remove all duplicate names before taking any action. So here is a Python script that you can use in any list to remove duplicate values:

def remove(items):
    list1 = []
    for i in items:
        if i not in list1:
            list1.append(i)
    return list1

a = ["Aman", "Akanksha", "Aman", "Shiwangi", "Sajid"]
print(remove(a))

Cleaning Text

Data cleansing is one of the most important steps when working on a data science task. When data is about opinions, it contains informal language with many language errors. So here is a Python function that you can apply to the text column of your data:

import re
import nltk
import nltk
from nltk.corpus import stopwords
import string
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
stopword=set(stopwords.words('english'))

def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

Scraping Tables from a Website

If you want to collect data from a table of a webpage, then this Python script is for you. It will also store the collected data into a CSV file:

import csv
from urllib.request import urlopen
from bs4 import BeautifulSoup
import pandas as pd

html = urlopen("https://bit.ly/3jpMFRW")
soup = BeautifulSoup(html, "html.parser")
table = soup.findAll("table", {"class":"wikitable"})[0]
rows = table.findAll("tr")

with open("Dataset.csv", "wt+", newline="") as f:
    writer = csv.writer(f)
    for i in rows:
        row = []
        for cell in i.findAll(["td", "th"]):
            row.append(cell.get_text())
        writer.writerow(row)
data = pd.read_csv("Dataset.csv")
data.head()

Converting Image to an Array

To analyze the features of an image, you first need to convert it into an array. Below is how you can convert any image into an array using Python:

from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
img = load_img("image.png")
# converting it to array
data = img_to_array(img)
print(data)

Annotation of Graphs

When plotting data on charts, annotations help to get a better idea of the features to which the data points relate. So here is a simple example of how you can annotate a chart using Python:

import matplotlib.pyplot as plt
x = [3, 5, 7, 5, 4]
y = [5, 3, 4, 5, 2]

labels = ["Jan", "Feb", "Mar", "April", "May"]
plt.scatter(x, y)

for i, j in enumerate(labels):
    plt.annotate(j, (x[i]+0.10, y[i]), fontsize=10)
plt.show()

Summary

So these were some of the Python scripts that you can use in the middle of solving real-time problems. Most of these Python scripts are related to data science, as the use of Python is mostly in data science today. I hope you liked this article on five useful Python scripts that you can use. Feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1610

Leave a Reply

Discover more from thecleverprogrammer

Subscribe now to keep reading and get access to the full archive.

Continue reading