You must have seen the news divided into categories when you go to a news website. Some of the popular categories that you’ll see on almost any news website are tech, entertainment, and sports. If you want to know how to classify news categories using machine learning, this article is for you. In this article, I will walk you through the task of news classification with machine learning using Python.
Every news website classifies the news article before publishing it so that every time visitors visit their website can easily click on the type of news that interests them. For example, I like to read the latest technology updates, so every time I visit a news website, I click on the technology section. But you may or may not like to read about technology, you may be interested in politics, business, entertainment, or maybe sports.
Currently, the news articles are classified by hand by the content managers of news websites. But to save time, they can also implement a machine learning model on their websites that read the news headline or the content of the news and classifies the category of the news. In the section below, I will take you through how you can train a machine learning model for the task of news classification using the Python programming language.
News Classification using Python
For the task of news classification with machine learning, I have collected a dataset from Kaggle, which contains news articles including their headlines and categories. The categories covered in this dataset are:
So let’s import the necessary Python libraries and the dataset that we need for this task:
category ... content 0 business ... Quarterly profits at US media giant TimeWarne... 1 business ... The dollar has hit its highest level against ... 2 business ... The owners of embattled Russian oil giant Yuk... 3 business ... British Airways has blamed high fuel prices f... 4 business ... Shares in UK drinks and food firm Allied Dome... [5 rows x 4 columns]
Now, let’s have a quick look at whether this dataset contains any null values or not:
category 0 filename 0 title 0 content 0 dtype: int64
The labels that we need to classify from this dataset are present in the category column of this data, let’s have a look at the distribution of all the categories of news:
sport 511 business 510 politics 417 tech 401 entertainment 386 Name: category, dtype: int64
News Classification Model
Now let’s prepare the data for the task of training a news classification model:
Now I will be using the Multinomial Naive Bayes algorithm to train a news classification model:
model = MultinomialNB() model.fit(X_train,y_train)
Finally, let’s test how this model works on one of the headlines in today’s news:
user = input("Enter a Text: ") data = cv.transform([user]).toarray() output = model.predict(data) print(output)
Enter a Text: Latest Apple iPhone SE 3 concept renders show a compact smartphone in the style of the iPhone 4 ['tech']
So this is how you can train a news classification model with machine learning using Python.
So this is how we can use machine learning to classify the categories of news. Every news website classifies the news article before publishing it so that every time visitors visit their website can easily click on the type of news that interests them. I hope you liked this article on news classification with machine learning using Python. Feel free to ask your valuable questions in the comments section below.