Most recommendation systems use content-based filtering and collaborative filtering to show recommendations to the user to provide a better user experience. Content-based filtering generates recommendations based on a user’s behaviour. In this article, I will walk you through what content-based filtering is in machine learning and how to implement it using Python.
What is a Recommendation System?
A recommendation system is used to generate personalized recommendations by understanding a user’s preferences using data such as user history, time of viewing or reading etc. There are many applications based on recommendation systems. Most of the categories of these apps are:
- Online Shopping (Amazon, Zomato, etc.)
- Audio (Songs, Audiobooks, Podcast, etc.)
- Video Recommendations (YouTube, Netflix, Amazon Prime, etc.)
So there are two types of recommendation systems:
- Collaborative Filtering
- Content-Based Filtering
Collaborative filtering uses the behaviour of other users who have similar interests like you and based on the activities of those users, it shows you perfect recommendations. A recommendation system based on the content-based method will show you recommendations based on your behaviour. In the section below, I’ll walk you through how content-based filtering in machine learning works in detail, and then we’ll see how to implement it using Python.
Also, Read – 200+ Machine Learning Projects Solved and Explained.
Content-Based Filtering
A recommendation system based on content-based filtering provides recommendations to the user by analyzing the description of the content that has been rated by the user. In this method, the algorithm is trained to understand the context of the content and find similarities in other content to recommend the same class of content to a particular user.
Let’s understand the process of content-based filtering by looking at all the steps that are involved in this method for generating recommendations for the user:
- It begins by identifying the keywords to understand the context of the content. In this step, it avoids unnecessary words such as stop words.
- Then it finds the same kind of context in other content to find the similarities. To determine the similarities between two or more contents, the content-based method uses cosine similarities.
- It finds similarities by analyzing the correlation between two or more users.
- Then finally it generates recommendations by calculating the weighted average of all user ratings for active users.
Hope you now understand how content-based filtering works. Now in the section below, I will walk you through how to implement it using the Python programming language.
Content-Based Filtering with Python
I hope till now you have understood what are recommendation systems and how content-based method is used to generate recommendations for a user. Now let’s see how to implement content-based method with Python. For this task, I will be using the dataset provided by MovieLens to create a movie recommendation system using content-based filtering with Python.
Let’s start his task by importing the necessary Python libraries and the dataset:
adult belongs_to_collection budget ... video vote_average vote_count 0 False {'id': 10194, 'name': 'Toy Story Collection', ... 30000000 ... False 7.7 5415.0 1 False NaN 65000000 ... False 6.9 2413.0 2 False {'id': 119050, 'name': 'Grumpy Old Men Collect... 0 ... False 6.5 92.0 3 False NaN 16000000 ... False 6.1 34.0 4 False {'id': 96871, 'name': 'Father of the Bride Col... 0 ... False 5.7 173.0
Now, I’m going to implement all of the steps I talked about in the content-based filtering process mentioned above using Python. Here I will prepare the data first, then select the columns that we will use to understand the context of the content, then we will remove the stop words and finally, we will find the cosine similarities to generate recommendations:
title Toy Story 0 Jumanji 1 Grumpier Old Men 2 Waiting to Exhale 3 Father of the Bride Part II 4 ... Subdue 45461 Century of Birthing 45462 Betrayal 45463 Satan Triumphant 45464 Queerama 45465 Length: 45466, dtype: int64
Now let’s create a function and have a look at how the recommendation system is working:
23530 Andy Hardy Meets Debutante 21422 A Family Affair 26304 You're Only Young Once 10301 The 40 Year Old Virgin 29369 Andy Hardy's Private Secretary 23843 Andy Hardy's Blonde Trouble 15348 Toy Story 3 43427 Andy Kaufman Plays Carnegie Hall 38476 Superstar: The Life and Times of Andy Warhol 42721 Andy Peters: Exclamation Mark Question Point 8327 The Champ 28128 The Mayor of Casterbridge 21359 Andy Hardy's Double Life 32086 Brother's Keeper Name: title, dtype: object
So, I hope you liked this article on what is the content-based method in machine learning and its implementation using Python. Feel free to ask your valuable questions in the comments section below.