YouTube is the world’s most popular and widely used video platform today. In this article, I’m going to introduce you to a data science project on YouTube trending videos analysis with Python programming language.
Data Science Project on Youtube Trending Videos Analysis with Python
The dataset that I will be using for the analysis of Youtube trending videos was collected over 205 days. For each of those days, the dataset contains data on trending videos for that day. It contains data on over 40,000 trending videos.
We will analyze the data to get insight on trending YouTube videos, to see what is common among all trending videos. This information can also be used by people who want to increase the popularity of their videos on YouTube.
Now let’s get started with the task of Youtube trending videos analysis with Python by importing the necessary Python libraries:
Now let’s read the data and set a few configuration options just to improve visualization plots:
The description column has some null values. These are some of the rows with null description values. So, to do some sort of data cleaning and to get rid of those null values, I’m going to put an empty string in place of each null value in the Description column:
df["description"] = df["description"].fillna(value="")
Now let’s take a look at some statistical information about the numeric columns in our dataset:
Observations from the above table:
- The average number of views of a trending video is 2,360,784. The median value of the number of views is 681,861, which means that half of the trending videos have less than this number and the other half have views greater than this number.
- The average number of likes for a trending video is 74,266, while the average number of likes is 3,711.
- The average number of comments is 8,446 while the median is 1,856.
Now we want to see how many trending video titles have at least one uppercase word (eg WHAT). To do this, we’ll add a new variable to the dataset whose value is True if the video title contains at least one uppercase word, and False otherwise:
We can see that 44% of trending video titles contain at least one word in all caps. We will use our added variable later to analyze the correlation between the variables.
Let’s add another column to our dataset to analyze the length of titles of videos, then plot the title length histogram to get an idea of the length of trending video titles:
We can see that the videos title length distribution looks like a normal distribution, where most videos have a title length of around 30-60 characters. Now, let’s draw a scatter plot to analyze the relationship between the title length and the number of views:
Looking at the scatter plot, we can tell that there is no relationship between the length of the title and the number of views. However, we do notice an interesting thing that the Videos having 100,000,000 and more views have a title length of between 33 and 55 characters or so.
Youtube Trending Videos Analysis: Correlation
Now let’s see how the variables in the dataset correlate to each other: for example, we would like to see how views and likes correlate, meaning that views and likes increase and decrease together:
The correlation map and correlation table above indicate that views and likes are strongly positively correlated.
Let’s see if some words are used meaningfully in trending video titles. Let’s draw a word cloud for the titles of our trending videos, which is a way to visualize the most common words in the titles; the more common the word, the larger its font-size:
I hope you liked this article on Data Science Project on Youtube Trending Videos Analysis with Python programming language. Feel free to ask your valuable questions in the comments section below.