# Best Streaming Service Analysis with Python

Data Science Project on Best Streaming Service Analysis with Python

There is a lot of competition between all the major streaming services like Netflix, Prime Video, Hulu, and Disney+. As a Data Scientist, it could be a very amazing task for you to find which is the best streaming service among them. In this article, I’m going to introduce you to a data science project on the best streaming service analysis with Python.

## Best Streaming Service Analysis

For analyzing which is the best streaming service, I will be using the ratings of shows on all the major platforms like Netflix, Prime Video, Hulu, and Disney+.

Also, Read – 100+ Machine Learning Projects Solved and Explained.

The dataset that I will use for the task of Best Streaming service analysis contains a comprehensive list of all the TV shows which are available on the 4 platforms that we are comparing in this task.

I am using this dataset to find the best streaming service but as a beginner, you can also use this dataset for the tasks such as:

1. Analyzing the streaming platforms
2. Analyzing the IMBD and Rotten Tomatoes ratings of all the shows
3. Analyzing the target age group of most of the TV shows.

## Best Streaming Service Analysis with Python

Now let’s get started with the task of Best Streaming service analysis with Python. I will start this task by importing all the necessary libraries and the dataset:

As we are only analyzing the data so we don’t need to use machine learning algorithms here. Most of the work can be done by visualizing and analyzing the ratings of shows on the streaming platforms.

## Data Preparation

Let’s prepare the dataset so that we can easily analyze the data. I will start preparing the data by dropping the duplicate values based on the title of the shows:

```tv_shows.drop_duplicates(subset='Title',
keep='first',inplace=True)```

Now, in the code section below, I will fill the null values in the data with zeroes and then convert them into integer data types:

Visualizing the data will be easies if we get 1s and 0s in the columns named Netflix, Hulu, Disney and Prime Video under a categorical format. There may be a chance that the same show is available in more than one platform:

Now I will merge this data with the data we started with but I will drop some unwanted columns:

Now let’s plat the data where the rantings are more than 1 to see the quantity of the tv shows available on each platform:

## Final Step: Finding Best Streaming Service

Now let’s visualize the data to find the best streaming service based on their ratings. I will first use the violin charts to gauge the content ratings and the freshness of the streaming platform:

Now let’s use a scatter plot to compare the ratings between IMBD and Rotten Tomatoes to compare which streaming platform has the best ratings in both the user rating platforms:

```px.scatter(tv_shows_both_ratings, x='IMDb',
y='Rotten Tomatoes',color='StreamingOn')```

### Conclusion:

By using the violin chart we can observe that:

1. Hulu, Netflix, and Amazon Videos all have important data. As content increases, quality decreases for all three.
2. Prime Videos has become denser in the top half when looking at IMDB and performs well in cool.
3. Disney+ being new, has also been very successful in this area.

Using the scatter plot we can observe that it is quite obvious that Amazon Prime performs very well in the fourth quadrant. Even by using the bar plot, we can observe that Amazon prime had a great quantity of content. So looking at all the streaming platforms we can conclude that Amazon Prime is better in both quality and quantity.

I hope you liked this article on Data Science project on Best Streaming Service analysis with Python programming language. Feel free to ask your valuable questions in the comments section below.

##### Aman Kharwal

Coder with the ♥️ of a Writer || Data Scientist | Solopreneur | Founder

Articles: 1333

1. #### Chenjie.zhong

I think the rating data is not independent w.r.t. each platform. For example, in some data records that a rating score share for several platforms, one can get same rating value for both platforms even if platform A performs much better than B does, therefore, there is no technique to get a good inference on which platform performs best via the given data. One should give out some columns like CLICK RATE for each platform w.r.t. each company, each video names, thus we can normalize the rates and multiply the given rating score to get a more reliable metric.

• #### Aman Kharwal

sure you can do that

2. #### Chin

Can you please tell me how to read the csv file? The code’s not working to read csv file. I’m using PyCharm.

• #### Aman Kharwal

make sure that the csv file is in the same directory where you Python file is, or enter the complete path to your csv file.

• #### Chin

Thanks for the quick reply. Actually I used the file path and some error showed up I thought it’s because of the path or something. But now exact same error is showing when both .py and .csv are in the same directory and used your exact code.

This is the error:

Traceback (most recent call last):
File “pandas\_libs\lib.pyx”, line 2305, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string “100/100”

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
tv_shows[‘Rotten Tomatoes’] = pd.to_numeric(tv_shows[‘Rotten Tomatoes’])
File “C:\Users\Chinmay\PycharmProjects\pythonProject\venv\lib\site-packages\pandas\core\tools\numeric.py”, line 183, in to_numeric
values, _ = lib.maybe_convert_numeric(
File “pandas\_libs\lib.pyx”, line 2347, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string “100/100” at position 0

Process finished with exit code 1

Thank You very much.