Book Recommendation System with Python

A Book Recommendation System is a data-driven application designed to suggest books to users based on their preferences, reading history, and behaviour. It employs various data science and machine learning techniques to provide personalized book recommendations, enhancing the reading experience for users. If you want to learn how to build a Book Recommendation System, this article is for you. In this article, I’ll take you through the task of building a Book Recommendation System using Python.

Book Recommendation System: Process We Can Follow

Building a book recommendation system requires a combination of data processing, machine learning expertise, and a deep understanding of user preferences. Below are the steps you can follow to build a Book Recommendation System:

  1. Collect a comprehensive dataset of books, including information like titles, authors, genres, summaries, and user ratings. Additionally, collect user data, such as reading history, reviews, and ratings.
  2. Clean and preprocess the collected data.
  3. Conduct EDA to identify popular genres, highly-rated books, and user reading patterns.
  4. Choose recommendation algorithms that suit the dataset and user requirements.
  5. Train the selected recommendation model using the preprocessed data to build a model that can predict user preferences and generate personalized book recommendations.

So, the process begins with collecting a dataset based on book information. I found an ideal dataset for this task. You can download the dataset from here.

Book Recommendation System with Python

Now, let’s get started with the task of building a Book recommendation system by importing the necessary Python libraries and the dataset:

import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel
import plotly.express as px
import plotly.graph_objects as go

data = pd.read_csv("books_data.csv")
print(data.head())
   bookID                                              title  \
0       1  Harry Potter and the Half-Blood Prince (Harry ...   
1       2  Harry Potter and the Order of the Phoenix (Har...   
2       4  Harry Potter and the Chamber of Secrets (Harry...   
3       5  Harry Potter and the Prisoner of Azkaban (Harr...   
4       8  Harry Potter Boxed Set  Books 1-5 (Harry Potte...   

                      authors average_rating  
0  J.K. Rowling/Mary GrandPré           4.57  
1  J.K. Rowling/Mary GrandPré           4.49  
2                J.K. Rowling           4.42  
3  J.K. Rowling/Mary GrandPré           4.56  
4  J.K. Rowling/Mary GrandPré           4.78  

Now, let’s have a look at the column information:

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11127 entries, 0 to 11126
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   bookID          11127 non-null  int64 
 1   title           11127 non-null  object
 2   authors         11127 non-null  object
 3   average_rating  11127 non-null  object
 4   title_length    11127 non-null  int64 
dtypes: int64(2), object(3)
memory usage: 434.8+ KB

Let’s see the distribution of average ratings of all the books:

fig = px.histogram(data, x='average_rating', 
                   nbins=30, 
                   title='Distribution of Average Ratings')
fig.update_xaxes(title_text='Average Rating')
fig.update_yaxes(title_text='Frequency')
fig.show()
Book Recommendation System: Distribution of Average Ratings

Now, let’s have a look at the total number of books per author:

top_authors = data['authors'].value_counts().head(10)
fig = px.bar(top_authors, x=top_authors.values, y=top_authors.index, orientation='h',
             labels={'x': 'Number of Books', 'y': 'Author'},
             title='Number of Books per Author')
fig.show()
Number of Books per Author

The average rating column in an object data type in the dataset. Let’s convert it into numeric:

# Convert 'average_rating' to a numeric data type
data['average_rating'] = pd.to_numeric(data['average_rating'], 
                                       errors='coerce')

To consider book content for recommendations, we’ll use the book titles and authors. Let’s combine these features into a single text feature:

# Create a new column 'book_content' by combining 'title' and 'authors'
data['book_content'] = data['title'] + ' ' + data['authors']

Now, we will transform the text-based features into numerical vectors using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization:

tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(data['book_content'])

It converts text data into a numerical representation suitable for recommendation algorithms.

Now, we’ll use a simple content-based recommendation system algorithm based on the cosine similarity between books:

# Compute the cosine similarity between books
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

Now, let’s define a function to recommend books based on user preferences:

def recommend_books(book_title, cosine_sim=cosine_sim):
    # Get the index of the book that matches the title
    idx = data[data['title'] == book_title].index[0]

    # Get the cosine similarity scores for all books with this book
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort the books based on the similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the top 10 most similar books (excluding the input book)
    sim_scores = sim_scores[1:11]

    # Get the book indices
    book_indices = [i[0] for i in sim_scores]

    # Return the top 10 recommended books
    return data['title'].iloc[book_indices]

This function will take a book title as input and recommend books with high cosine similarity. Now, let’s test the recommendation system by providing a book title and getting recommendations:

book_title = "Dubliners: Text  Criticism  and Notes"
recommended_books = recommend_books(book_title)
print(recommended_books)
6191      CliffsNotes on Joyce's Dubliners (Cliffs Notes)
2988                                            Dubliners
2987                             The Portable James Joyce
3981                      White Noise: Text and Criticism
7704               The Quiet American: Text and Criticism
2871                          Sam Walton: Made In America
6188                                            Dubliners
2788                                    Dumpy's Valentine
796     Great Expectations: Authoritative Text  Backgr...
8199    Middlemarch: An Authoritative Text  Background...
Name: title, dtype: object

Summary

So this is how we can build a Book Recommendation System using Python. A Book Recommendation System is a data-driven application designed to suggest books to users based on their preferences, reading history, and behaviour. I hope you liked this article on building a book recommendation system using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1536

Leave a Reply