Book Recommendation System

Recommendation systems are widely used today to recommend products to users based on their interests. A recommendation system is one of the strongest systems for increasing profits by retaining more users in a very big competition. In this article, I’ll walk you through how to build a book recommendation system with Machine Learning using the Python programming language.

Book Recommendation System

Online book reading and selling websites like Kindle and Goodreads compete against each other on many factors. One of those important factors is their book recommendation system. A book recommendation system is designed to recommend books of interest to the buyer.

Also, Read – 100+ Machine Learning Projects Solved and Explained.

The purpose of a book recommendation system is to predict buyer’s interest and recommend books to them accordingly. A book recommendation system can take into account many parameters like book content and book quality by filtering user reviews. In the section below, I will introduce you to a machine learning project on the book recommendation system using Python.

Book Recommendation System with Machine Learning

In this section, I will take you through how to build a Book recommendation system with Machine Learning using Python. I will start this task by importing the necessary Python libraries and the dataset:

book recommendation system: dataset

Data Exploration:

The dataset that contains information about the books, who wrote these books and other relevant information. Now that we know what our data looks like, let’s go ahead and find all the null values present in our data:

df.isnull().sum()
bookID                0
title                 0
authors               0
average_rating        0
isbn                  0
isbn13                0
language_code         0
  num_pages           0
ratings_count         0
text_reviews_count    0
publication_date      0
publisher             0
dtype: int64
df.describe()
description of the dataset

From the results above, we can see that our scores are all between 0 and 5. We also get to know more about the other columns, such as the average of the mean scores and other information that might help us in the next steps. We also checked the data types of each column and also saw that there were no null values present in our data. Now let’s move further:

top 10 books

The results above show us the top 10 books in our data. We saw that the max score in our data was 5.0 but we don’t see any books in the above result with a score of 5.0. Indeed, we have filtered these books according to the number of notes. We’ve made sure that all of the books we have in the above results have a decent rating. There may be books in the data that may have only 1 or 2 notes may be rated 5.0. We want to avoid such books, which is why we used this type of filtering.

Let’s go ahead and take a look at some of the top authors in our data. We’ll rank them according to the number of books they’ve written as long as those books are present in the data:

top 10 authors: book recommendation system

From the above chart, Stephen King and P.G. Wodehouse have the most books in the data. Both authors have 40 books in our dataset followed by Rumiko Takahashi and Orson Scott Card.

Next, we’ll take a look at which books have been reviewed the most. We have the average rating column in our data and also the number of times a particular book has been rated. We will try to use this column to find the most commented books present in our data:

most rated books

We can see that Twilight has been rated more times than any other book! Also, these ratings are all in the millions! So that means Twilight has been reviewed over 4 million times, followed by The Hobbit or There and Back Again and The Catcher in the Rye which has been reviewed over 2 million times.

Let’s try to find a relation between our average score and the number of scores. We are doing this to see how we can use these columns in our recommendation. We will also check the distribution of average scores with the number of pages in a book, the language used in the book and the number of text reviews:

average ratings of books
correlation
correlation: book recommendation system

After comparing the average rating with the different columns, we can continue to use the language and number of ratings for our recommendation system. Still, the other columns didn’t make much sense and using them might not help us to a great extent so we could omit them.

I will make a copy of our original data just to be safe so that we are safe in case we mess anything up:

df2 = df.copy()

Data Preparation:

We are now going to create a new column called ‘rating_between’. We will divide our average rating column into different categories such as rating between 0 and 1, 1 and 2, etc. This will work as one of the features that we will build into our model so that it can make better predictions:

We are now going to create two new DataFrames containing the different values of the rating_between column that we have just created. We will assign a value of 1 if one grade falls under a particular group, say 4 and 5, and the others will be assigned a value of 0.

We will apply the same approach to split the language code column to retrieve these languages individually and give them the value of 1 and 0 also where 1 will be assigned if the book is written in a particular language eg English and 0 if it is not written in English:

rating_df = pd.get_dummies(df2['rating_between'])
language_df = pd.get_dummies(df2['language_code'])

We are now going to concatenate these two data frames into one and name it as features. This DataFrame will be the functionality that we provide to the Book Recommendation System with Machine Learning. It will contain the values of rating_df and language_df and will also have the values of average grade and number of grades:

features = pd.concat([rating_df, 
                      language_df, 
                      df2['average_rating'], 
                      df2['ratings_count']], axis=1)

Book Recommendation System: Final Step

Now that our features are ready, we will now use the Min-Max scaler to reduce these values. This will help reduce the bias for some of the books that have too many features. The algorithm will find the median for all and equalize it:

We have reduced the features and we now can use the KNN algorithm to build our Book Recommendation system with Machine Learning using Python:

We have built a machine learning model for recommending books and now we will need to create a function using Python. When this function is called, we will have to pass the name of the book to it. The model will try to find books based on the features. We’ll store those book names that the system recommends in a list and return them at the end:

['Harry Potter and the Half-Blood Prince (Harry Potter  #6)',
 'Harry Potter and the Order of the Phoenix (Harry Potter  #5)',
 'The Fellowship of the Ring (The Lord of the Rings  #1)',
 'Harry Potter and the Chamber of Secrets (Harry Potter  #2)',
 'Harry Potter and the Prisoner of Azkaban (Harry Potter  #3)',
 'The Lightning Thief (Percy Jackson and the Olympians  #1)']

With this, we come to the end of a machine learning project on the book recommendation system. As we can see, our model shows a pretty decent result. Hope you liked this article on Book Recommendation System With Machine Learning using Python. Please feel free to ask your valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1609

Leave a Reply

Discover more from thecleverprogrammer

Subscribe now to keep reading and get access to the full archive.

Continue reading