News Classification with Machine Learning

You must have seen the news divided into categories when you go to a news website. Some of the popular categories that you’ll see on almost any news website are tech, entertainment, and sports. If you want to know how to classify news categories using machine learning, this article is for you. In this article, I will walk you through the task of news classification with machine learning using Python.

News Classification

Every news website classifies the news article before publishing it so that every time visitors visit their website can easily click on the type of news that interests them. For example, I like to read the latest technology updates, so every time I visit a news website, I click on the technology section. But you may or may not like to read about technology, you may be interested in politics, business, entertainment, or maybe sports.

Currently, the news articles are classified by hand by the content managers of news websites. But to save time, they can also implement a machine learning model on their websites that read the news headline or the content of the news and classifies the category of the news. In the section below, I will take you through how you can train a machine learning model for the task of news classification using the Python programming language.

News Classification using Python

For the task of news classification with machine learning, I have collected a dataset from Kaggle, which contains news articles including their headlines and categories. The categories covered in this dataset are:

  1. Sports
  2. Business
  3. Politics
  4. Tech
  5. Entertainment

So let’s import the necessary Python libraries and the dataset that we need for this task:

   category  ...                                            content
0  business  ...   Quarterly profits at US media giant TimeWarne...
1  business  ...   The dollar has hit its highest level against ...
2  business  ...   The owners of embattled Russian oil giant Yuk...
3  business  ...   British Airways has blamed high fuel prices f...
4  business  ...   Shares in UK drinks and food firm Allied Dome...

[5 rows x 4 columns]

Now, let’s have a quick look at whether this dataset contains any null values or not:

data.isnull().sum()
category    0
filename    0
title       0
content     0
dtype: int64

The labels that we need to classify from this dataset are present in the category column of this data, let’s have a look at the distribution of all the categories of news:

data["category"].value_counts()
sport            511
business         510
politics         417
tech             401
entertainment    386
Name: category, dtype: int64

News Classification Model

Now let’s prepare the data for the task of training a news classification model:

Now I will be using the Multinomial Naive Bayes algorithm to train a news classification model:

model = MultinomialNB()
model.fit(X_train,y_train)

Finally, let’s test how this model works on one of the headlines in today’s news:

user = input("Enter a Text: ")
data = cv.transform([user]).toarray()
output = model.predict(data)
print(output)
Enter a Text: Latest Apple iPhone SE 3 concept renders show a compact smartphone in the style of the iPhone 4
['tech']

So this is how you can train a news classification model with machine learning using Python.

Summary

So this is how we can use machine learning to classify the categories of news. Every news website classifies the news article before publishing it so that every time visitors visit their website can easily click on the type of news that interests them. I hope you liked this article on news classification with machine learning using Python. Feel free to ask your valuable questions in the comments section below.

Default image
Aman Kharwal
Coder with the ♥️ of a Writer || Data Scientist | Solopreneur | Founder
Articles: 1126

Leave a Reply