Movie Recommendation System with Machine Learning

Movie Recommendation System with Machine Learning

Recommendation systems are among the most popular applications of data science. They are used to predict the Rating or Preference that a user would give to an item.

Almost every major company has applied them in some form or the other: Amazon uses it to suggest products to customers, YouTube uses it to decide which video to play next on auto play, and Facebook uses it to recommend pages to like and people to follow.

Let’s Build our own recommendation system

In this Data Science project, you will see how to build a basic model of simple as well as content-based recommendation systems.

While these models will be nowhere close to the industry standard in terms of complexity, quality or accuracy, it will help you to get started with building more complex models that produce even better results.

Download the data sets you need to build this movie recommendation model from here:

import pandas as pd
import numpy as np
credits = pd.read_csv("tmdb_5000_credits.csv")
movies = pd.read_csv("tmdb_5000_movies.csv")
print("Movies Dataframe:",movies.shape)

[5 rows x 20 columns]
Credits: (4803, 4)
Movies Dataframe: (4803, 20)

credits_column_renamed = credits.rename(index=str, columns={"movie_id": "id"})
movies_merge = movies.merge(credits_column_renamed, on='id')
movies_cleaned = movies_merge.drop(columns=['homepage', 'title_x', 'title_y', 'status','production_countries'])

Content Based Recommendation System

Now lets make a recommendations based on the movie’s plot summaries given in the overview column. So if our user gives us a movie title, our goal is to recommend movies that share similar plot summaries.

from sklearn.feature_extraction.text import TfidfVectorizer
tfv = TfidfVectorizer(min_df=3,  max_features=None,
            strip_accents='unicode', analyzer='word',token_pattern=r'\w{1,}',
            ngram_range=(1, 3),
            stop_words = 'english')
# Fitting the TF-IDF on the 'overview' text
tfv_matrix = tfv.fit_transform(movies_cleaned_df['overview'])

<4803×10417 sparse matrix of type ”
with 127220 stored elements in Compressed Sparse Row format>
(4803, 10417)

from sklearn.metrics.pairwise import sigmoid_kernel

# Compute the sigmoid kernel
sig = sigmoid_kernel(tfv_matrix, tfv_matrix)

array([0.76163447, 0.76159416, 0.76159416, …, 0.76159416, 0.76159416, 0.76159416])

Reverse mapping of indices and movie titles

# Reverse mapping of indices and movie titles
indices = pd.Series(movies_cleaned.index, index=movies_cleaned['original_title']).drop_duplicates()
print(sorted(list(enumerate(sig[indices['Newlyweds']])), key=lambda x: x[1], reverse=True))
def give_recomendations(title, sig=sig):
    # Get the index corresponding to original_title
    idx = indices[title]

    # Get the pairwsie similarity scores
    sig_scores = list(enumerate(sig[idx]))

    # Sort the movies
    sig_scores = sorted(sig_scores, key=lambda x: x[1], reverse=True)

    # Scores of the 10 most similar movies
    sig_scores = sig_scores[1:11]

    # Movie indices
    movie_indices = [i[0] for i in sig_scores]

    # Top 10 most similar movies
    return movies_cleaned['original_title'].iloc[movie_indices]

Testing our content-based recommendation system with the seminal film Spy Kids



1341 Obitaemyy Ostrov
634 The Matrix
3604 Apollo 18
2130 The American
775 Supernova
529 Tears of the Sun
151 Beowulf
311 The Adventures of Pluto Nash
847 Semi-Pro
942 The Book of Life
Name: original_title, dtype: object

Follow us on Instagram for all your Queries


  1. Hey, I’m new to these kind of projects, would you mind telling me where and what kind of machine learning concept is involved in this project?

  2. Okay, I will email you this project so that you can rectify

Leave a Reply