A Book Recommendation System is a data-driven application designed to suggest books to users based on their preferences, reading history, and behaviour. It employs various data science and machine learning techniques to provide personalized book recommendations, enhancing the reading experience for users. If you want to learn how to build a Book Recommendation System, this article is for you. In this article, I’ll take you through the task of building a Book Recommendation System using Python.
Book Recommendation System: Process We Can Follow
Building a book recommendation system requires a combination of data processing, machine learning expertise, and a deep understanding of user preferences. Below are the steps you can follow to build a Book Recommendation System:
- Collect a comprehensive dataset of books, including information like titles, authors, genres, summaries, and user ratings. Additionally, collect user data, such as reading history, reviews, and ratings.
- Clean and preprocess the collected data.
- Conduct EDA to identify popular genres, highly-rated books, and user reading patterns.
- Choose recommendation algorithms that suit the dataset and user requirements.
- Train the selected recommendation model using the preprocessed data to build a model that can predict user preferences and generate personalized book recommendations.
So, the process begins with collecting a dataset based on book information. I found an ideal dataset for this task. You can download the dataset from here.
Book Recommendation System with Python
Now, let’s get started with the task of building a Book recommendation system by importing the necessary Python libraries and the dataset:
import pandas as pd import numpy as np from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.metrics.pairwise import linear_kernel import plotly.express as px import plotly.graph_objects as go data = pd.read_csv("books_data.csv") print(data.head())
bookID title \ 0 1 Harry Potter and the Half-Blood Prince (Harry ... 1 2 Harry Potter and the Order of the Phoenix (Har... 2 4 Harry Potter and the Chamber of Secrets (Harry... 3 5 Harry Potter and the Prisoner of Azkaban (Harr... 4 8 Harry Potter Boxed Set Books 1-5 (Harry Potte... authors average_rating 0 J.K. Rowling/Mary GrandPré 4.57 1 J.K. Rowling/Mary GrandPré 4.49 2 J.K. Rowling 4.42 3 J.K. Rowling/Mary GrandPré 4.56 4 J.K. Rowling/Mary GrandPré 4.78
Now, let’s have a look at the column information:
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 11127 entries, 0 to 11126 Data columns (total 5 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 bookID 11127 non-null int64 1 title 11127 non-null object 2 authors 11127 non-null object 3 average_rating 11127 non-null object 4 title_length 11127 non-null int64 dtypes: int64(2), object(3) memory usage: 434.8+ KB
Let’s see the distribution of average ratings of all the books:
fig = px.histogram(data, x='average_rating', nbins=30, title='Distribution of Average Ratings') fig.update_xaxes(title_text='Average Rating') fig.update_yaxes(title_text='Frequency') fig.show()

Now, let’s have a look at the total number of books per author:
top_authors = data['authors'].value_counts().head(10) fig = px.bar(top_authors, x=top_authors.values, y=top_authors.index, orientation='h', labels={'x': 'Number of Books', 'y': 'Author'}, title='Number of Books per Author') fig.show()

The average rating column in an object data type in the dataset. Let’s convert it into numeric:
# Convert 'average_rating' to a numeric data type data['average_rating'] = pd.to_numeric(data['average_rating'], errors='coerce')
To consider book content for recommendations, we’ll use the book titles and authors. Let’s combine these features into a single text feature:
# Create a new column 'book_content' by combining 'title' and 'authors' data['book_content'] = data['title'] + ' ' + data['authors']
Now, we will transform the text-based features into numerical vectors using TF-IDF (Term Frequency-Inverse Document Frequency) vectorization:
tfidf_vectorizer = TfidfVectorizer(stop_words='english') tfidf_matrix = tfidf_vectorizer.fit_transform(data['book_content'])
It converts text data into a numerical representation suitable for recommendation algorithms.
Now, we’ll use a simple content-based recommendation system algorithm based on the cosine similarity between books:
# Compute the cosine similarity between books cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
Now, let’s define a function to recommend books based on user preferences:
def recommend_books(book_title, cosine_sim=cosine_sim): # Get the index of the book that matches the title idx = data[data['title'] == book_title].index[0] # Get the cosine similarity scores for all books with this book sim_scores = list(enumerate(cosine_sim[idx])) # Sort the books based on the similarity scores sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True) # Get the top 10 most similar books (excluding the input book) sim_scores = sim_scores[1:11] # Get the book indices book_indices = [i[0] for i in sim_scores] # Return the top 10 recommended books return data['title'].iloc[book_indices]
This function will take a book title as input and recommend books with high cosine similarity. Now, let’s test the recommendation system by providing a book title and getting recommendations:
book_title = "Dubliners: Text Criticism and Notes" recommended_books = recommend_books(book_title) print(recommended_books)
6191 CliffsNotes on Joyce's Dubliners (Cliffs Notes) 2988 Dubliners 2987 The Portable James Joyce 3981 White Noise: Text and Criticism 7704 The Quiet American: Text and Criticism 2871 Sam Walton: Made In America 6188 Dubliners 2788 Dumpy's Valentine 796 Great Expectations: Authoritative Text Backgr... 8199 Middlemarch: An Authoritative Text Background... Name: title, dtype: object
Summary
So this is how we can build a Book Recommendation System using Python. A Book Recommendation System is a data-driven application designed to suggest books to users based on their preferences, reading history, and behaviour. I hope you liked this article on building a book recommendation system using Python. Feel free to ask valuable questions in the comments section below.