Data Science Project on Area and Population

Data Science Beginner Project on Area and Population.

In this project we’ll use the size of points to indicate the area and populations of California cities. We would like a legend that specifies the scale of the sizes of the points, and we’ll accomplish this by plotting some labeled data with no entries.

You can download the dataset required for this project from here.

import pandas as pd
cities = pd.read_csv("california_cities.csv")
print(cities.head())
# extracting the data we ar interested in
latitude, longitude = cities["latd"], cities["longd"]
population, area = cities["population_total"], cities["area_total_km2"]
# to scatter the points, using size and color but without label
import numpy as np
import matplotlib.pyplot as plt
import seaborn
seaborn.set()
plt.scatter(longitude, latitude, label=None, c=np.log10(population),
            cmap='viridis', s=area, linewidth=0, alpha=0.5)
plt.axis(aspect='equal')
plt.xlabel('Longitude')
plt.ylabel('Longitude')
plt.colorbar(label='log$_{10}$(population)')
plt.clim(3, 7)
# now we will craete a legend, we will plot empty lists with the desired size and label
for area in [100, 300, 500]:
    plt.scatter([], [], c='k', alpha=0.3, s=area, label=str(area) + 'km$^2$')
plt.legend(scatterpoints=1, frameon=False, labelspacing=1, title='City Areas')
plt.title("Area and Population of California Cities")
plt.show()
Default image
Aman Kharwal

I am a programmer from India, and I am here to guide you with Data Science, Machine Learning, Python, and C++ for free. I hope you will learn a lot in your journey towards Coding, Machine Learning and Artificial Intelligence with me.

3 Comments

  1. Small correction – Both axes are labeled Longitude. X-Axis must be labeled Latitude

Leave a Reply