A recommendation system is a popular application of Data Science. Almost all the popular websites you visit use recommendation systems. As the name suggests, a news recommendation system is an application that recommends news articles based on the news a user is already reading. So, if you want to learn how to create a News Recommendation System, this article is for you. In this article, I will take you through how to create a News Recommendation System using Python.
How does a News Recommendation System Work?
When you visit any website, it recommends similar content based on what you are already watching or reading. Content recommendation based on the content the user is already consuming is a technique for creating a recommendation system known as Content-based filtering.
All the popular news websites use content-based recommendation systems designed to find similarities between the news you are reading and other news articles on their website to recommend the most similar news articles.
I hope you now have understood how a news recommendation system works. In the section below, I will take you through how to build a News Recommendation System using the Python programming language.
News Recommendation System using Python
The dataset I am using to build a News Recommendation System is from Microsoft. As the data needed a lot of cleaning and preparation, I downloaded data and prepared it to create a content-based recommendation system. You can download the dataset here (please download the data in CSV format).
Now let’s start with importing the necessary Python libraries and the dataset we need to build a News Recommendation System:
import numpy as np import pandas as pd from sklearn.feature_extraction import text from sklearn.metrics.pairwise import cosine_similarity import plotly.express as px import plotly.graph_objects as go data = pd.read_csv("News.csv") print(data.head())
ID News Category Title \ 0 N88753 lifestyle The Brands Queen Elizabeth, Prince Charles, an... 1 N45436 news Walmart Slashes Prices on Last-Generation iPads 2 N23144 health 50 Worst Habits For Belly Fat 3 N86255 health Dispose of unwanted prescription drugs during ... 4 N93187 news The Cost of Trump's Aid Freeze in the Trenches... Summary 0 Shop the notebooks, jackets, and more that the... 1 Apple's new iPad releases bring big deals on l... 2 These seemingly harmless habits are holding yo... 3 NaN 4 Lt. Ivan Molchanets peeked over a parapet of s...
Let’s have a look at the news categories in this dataset:
# Types of News Categories categories = data["News Category"].value_counts() label = categories.index counts = categories.values figure = px.bar(data, x=label, y = counts, title="Types of News Categories") figure.show()
There are two ways to build a recommendation system using this dataset:
- If we choose the News Category column as the feature we will use to find similarities, the recommendations may not help grab the user’s attention for a longer time. Suppose a user is reading news about sports based on a cricket match and gets news recommendations about other sports like Wrestling, Hockey, Football etc., which could be inappropriate according to the content the user is reading.
- The other way is to use the title or the summary as the feature to find similarities. It will give more accurate recommendations as the recommended content will be based on the content the user is already reading.
So we can use the title or the summary of the news article to find similarities with other news articles. Here I will use the title column. If you wish to use the summary column, first drop the rows with null values, as the summary column contains more than 5000 null values.
Below is how we can find similarities between the news articles by converting the texts of the title column into numerical vectors and then finding similarities between the numerical vectors using the cosine similarity algorithm:
feature = data["Title"].tolist() tfidf = text.TfidfVectorizer(input=feature, stop_words="english") tfidf_matrix = tfidf.fit_transform(feature) similarity = cosine_similarity(tfidf_matrix)
Now I will set the title column as the index of the data so that we can look for content recommendations by giving the title as an input:
indices = pd.Series(data.index, index=data['Title']).drop_duplicates()
Now below is how to build a News Recommendation System:
def news_recommendation(Title, similarity = similarity): index = indices[Title] similarity_scores = list(enumerate(similarity[index])) similarity_scores = sorted(similarity_scores, key=lambda x: x, reverse=True) similarity_scores = similarity_scores[0:10] newsindices = [i for i in similarity_scores] return data['Title'].iloc[newsindices] print(news_recommendation("Walmart Slashes Prices on Last-Generation iPads"))
1 Walmart Slashes Prices on Last-Generation iPads 83827 Walmart's Black Friday 2019 ad: the best deals... 76024 Walmart Black Friday 2019 deals unveiled: Huge... 90316 US consumer prices up 0.4% in October; gasolin... 89588 Consumer prices rise most in 7 months on highe... 32839 Inside the next generation of irons 37970 Walmart and Kroger Undercut Drugstore Chains' ... 100684 Nissan slashes full-year forecast as first-hal... 74916 The Top Deals at Walmart Right Now 39634 Federal Reserve slashes interest rates for thi... Name: Title, dtype: object
So this is how you can build a News Recommender System using the Python programming language.
All the popular news websites use content-based recommendation systems designed to find similarities between the news you are reading and other news articles on their website to recommend the most similar news articles. I hope you liked this article on how to build a News Recommender System using Python. Feel free to ask valuable questions in the comments section below.