Data Science Project on President Heights

If you are a beginner in Data Science you must solve this project, as you will learn a lot about working on Data, that comes from a csv file or any other formats.

This data is available in the file heights.csv, which is a simple comma-separated list of labels and values:

data = pd.read_csv("heights.csv")
print(data.head())

We’ll use the Pandas package to read the file and extract this information (note that the heights are measured in centimeters):

height = np.array(data["height(cm)"])
print(height)

Now that we have this data array, we can compute a variety of summary statistics:

print("Mean of heights =", height.mean())
print("Standard Deviation of height =", height.std())
print("Minimum height =", height.min())
print("Maximum height =", height.max())

Note that in each case, the aggregation operation reduced the entire array to a single summarizing value, which gives us information about the distribution of values. We may also wish to compute quantiles:

print("25th percentile =", np.percentile(height, 25))
print("Median =", np.median(height))
print("75th percentile =", np.percentile(height, 75))

We see that the median height of US presidents is 182 cm, or just shy of six feet. Of course, sometimes it’s more useful to see a visual representation of this data, which we can accomplish using tools in Matplotlib:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

plt.hist(height)
plt.title("Height Distribution of Presidents of USA")
plt.xlabel("height(cm)")
plt.ylabel("Number")
plt.show()

These aggregates are some of the fundamental pieces of exploratory data science that we’ll explore in more depth in later coming projects.

Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1611

15 Comments

Work on Data Science Projects | Data Science | Machine Learning | Python

June 1, 2020 / 8:51 pm Reply

[…] Data Science Project on President Heights […]
Joe Datrin

September 5, 2020 / 7:00 pm Reply

what is the use of sns.set() sir?
- Aman Kharwal
  
  September 5, 2020 / 7:10 pm Reply
  
  sns.set() (which means seaborn.set()), is used over matplotlib. For using the styles provided by seaborn for visualization you don’t need to prepare your data to fit in any method of seaborn. You just need to call the seaborn.set() method and it will automatically change the style of your matplotlib’s plot to seaborn’s plot.
Tark

November 3, 2020 / 4:58 pm Reply

For beginners, just add the necessary import for panda and numpy for clarity
- Aman Kharwal
  
  November 3, 2020 / 6:13 pm Reply
  
  sure, thanks for your valuable feedback, keep visiting us.
hoshang14

November 27, 2020 / 3:46 pm Reply

From where we can get the data set? I didnot find any link to download the data?
- Aman Kharwal
  
  November 27, 2020 / 4:13 pm Reply
  
  A link for the dataset heights.csv is available of the second paragraph. You can download the dataset from here.
Lalit Thakur

February 4, 2021 / 11:07 pm Reply

data.describe() can be used as well
- Aman Kharwal
  
  February 5, 2021 / 12:19 am Reply
  
  yes
basheer

February 5, 2021 / 10:44 pm Reply

thank you……!
- Aman Kharwal
  
  February 6, 2021 / 12:01 am Reply
  
  Most welcome
Dan

February 10, 2021 / 7:38 pm Reply

Excellent project for beginner .. thanks!
- Aman Kharwal
  
  February 10, 2021 / 8:35 pm Reply
  
  Thanks, keep visiting
Mfundo

April 18, 2021 / 3:30 pm Reply

Thank you for your work Aman. Extremely helpful !
- Aman Kharwal
  
  April 18, 2021 / 3:38 pm Reply
  
  Keep visiting 🌺

Aman Kharwal

Recommended For You

15 Comments

Leave a ReplyCancel reply

Discover more from thecleverprogrammer