There is a lot of competition between all the major streaming services like Netflix, Prime Video, Hulu, and Disney+. As a Data Scientist, it could be a very amazing task for you to find which is the best streaming service among them. In this article, I’m going to introduce you to a data science project on the best streaming service analysis with Python.
Best Streaming Service Analysis
For analyzing which is the best streaming service, I will be using the ratings of shows on all the major platforms like Netflix, Prime Video, Hulu, and Disney+.
The dataset that I will use for the task of Best Streaming service analysis contains a comprehensive list of all the TV shows which are available on the 4 platforms that we are comparing in this task.
I am using this dataset to find the best streaming service but as a beginner, you can also use this dataset for the tasks such as:
- Analyzing the streaming platforms
- Analyzing the IMBD and Rotten Tomatoes ratings of all the shows
- Analyzing the target age group of most of the TV shows.
Best Streaming Service Analysis with Python
Now let’s get started with the task of Best Streaming service analysis with Python. I will start this task by importing all the necessary libraries and the dataset:
As we are only analyzing the data so we don’t need to use machine learning algorithms here. Most of the work can be done by visualizing and analyzing the ratings of shows on the streaming platforms.
Let’s prepare the dataset so that we can easily analyze the data. I will start preparing the data by dropping the duplicate values based on the title of the shows:
Now, in the code section below, I will fill the null values in the data with zeroes and then convert them into integer data types:
Visualizing the data will be easies if we get 1s and 0s in the columns named Netflix, Hulu, Disney and Prime Video under a categorical format. There may be a chance that the same show is available in more than one platform:
Now I will merge this data with the data we started with but I will drop some unwanted columns:
Now let’s plat the data where the rantings are more than 1 to see the quantity of the tv shows available on each platform:
Final Step: Finding Best Streaming Service
Now let’s visualize the data to find the best streaming service based on their ratings. I will first use the violin charts to gauge the content ratings and the freshness of the streaming platform:
Now let’s use a scatter plot to compare the ratings between IMBD and Rotten Tomatoes to compare which streaming platform has the best ratings in both the user rating platforms:
px.scatter(tv_shows_both_ratings, x='IMDb', y='Rotten Tomatoes',color='StreamingOn')
By using the violin chart we can observe that:
- Hulu, Netflix, and Amazon Videos all have important data. As content increases, quality decreases for all three.
- Prime Videos has become denser in the top half when looking at IMDB and performs well in cool.
- Disney+ being new, has also been very successful in this area.
Using the scatter plot we can observe that it is quite obvious that Amazon Prime performs very well in the fourth quadrant. Even by using the bar plot, we can observe that Amazon prime had a great quantity of content. So looking at all the streaming platforms we can conclude that Amazon Prime is better in both quality and quantity.
I hope you liked this article on Data Science project on Best Streaming Service analysis with Python programming language. Feel free to ask your valuable questions in the comments section below.