In Machine Learning, Sentiment analysis refers to the application of natural language processing, computational linguistics, and text analysis to identify and classify subjective opinions in source documents. In this article, I will introduce you to a machine learning project on sentiment analysis with the Python programming language.
What is Sentiment Analysis?
Sentiment analysis aims to determine a writer’s attitude towards a topic or the overall contextual polarity of a document. The attitude can be his judgment or assessment, his emotional state or the intended emotional communication.
In sentiment analysis, the main task is to identify opinion words, which is very important. Opinion words are dominant indicators of feelings, especially adjectives, adverbs, and verbs, for example: “I love this camera. It’s amazing!”
Opinion words are also known as polarity words, sentiment words, opinion lexicon, or opinion words, which can generally be divided into two types: positive words, for example, wonderful. , elegant, astonishing; and negative words, eg horrible, disgusting, poor.
Machine Learning Project on Sentiment Analysis with Python
Now in this section, I will take you through a Machine Learning project on sentiment analysis with Python programming language. Let’s start by importing all the necessary Python libraries and the dataset:
|0||I grew up (b. 1965) watching and loving the Th…||0|
|1||When I put this movie in my DVD player, and sa…||0|
|2||Why do people who do not know what a particula…||0|
|3||Even though I have great interest in Biblical …||0|
|4||Im a die hard Dads Army fan and nothing will e…||1|
After reading the dataset which contains 40k movie reviews from IMDB, we see that there are two prominent columns. One being TEXT which contains the criticism and the other being LABEL which contains the O’s and 1’s, where 0-NEGATIVE and 1-POSITIVE.
Now let’s visualize the distribution of the data:
Then we will import RE, that is, the regular expression operation, we use this library to remove html tags like ‘<a>’ or. So whenever we come across these tags, we replace them with an empty string. Then we will also modify the emojis/emoticons which can be smileys :), a sad face: (or even an upset face: /. We will change the emojis towards the end to get a clean set of text:
Now, I’ll be using nltk’s PorterStemmer to simplify the data and remove unnecessary complexities in our text data:
Visualizing Negative and Positive Words
To visualzie the negative and postive words using a wordcloud, I will first remove the stopwords:
The positive words that are highlighted are love, excellent, perfect, good, beautiful, kind, excellent and The negative words that are highlighted are: horrible, wasteful, problem, stupid, horrible, bad, poor.
Now I will use the TF-IDF Vertorizer to convert the raw documents into feature matrix which is very important to train a Machine Learning model:
from sklearn.feature_extraction.text import TfidfVectorizer tfidf=TfidfVectorizer(strip_accents=None,lowercase=False,preprocessor=None,tokenizer=tokenizer_porter,use_idf=True,norm='l2',smooth_idf=True) y=data.label.values x=tfidf.fit_transform(data.text)
Training Machine Learning Model for Sentiment Analysis
Now to train a machine learning model I will split the data into 50 percent training and 50 percent test sets:
from sklearn.model_selection import train_test_split X_train,X_test,y_train,y_test=train_test_split(x,y,random_state=1,test_size=0.5,shuffle=False)
Now let’s train a machine learning model for the task of sentiment analysis by using the Logistic Regression model:
I hope you liked this article on Sentiment Analysis with Python programming language. Feel free to ask your valuable questions in the comments section below.