Predicting the stock market is one of the most important applications of Machine Learning in finance. In this article, I will take you through a simple Data Science project on Stock Price Prediction using Machine Learning Python.
At the end of this article, you will learn how to predict stock prices by using the Linear Regression model by implementing the Python programming language.
Also, Read – Machine Learning Full Course for free.
Stock Price Prediction
Predicting the stock market has been the bane and goal of investors since its inception. Every day billions of dollars are traded on the stock exchange, and behind every dollar is an investor hoping to make a profit in one way or another.
Entire companies rise and fall daily depending on market behaviour. If an investor is able to accurately predict market movements, he offers a tantalizing promise of wealth and influence.
Today, so many people are making money staying at home trading in the stock market. It is a plus point for you if you use your experience in the stock market and your machine learning skills for the task of stock price prediction.
Let’s see how to predict stock prices using Machine Learning and the python programming language. I will start this task by importing all the necessary python libraries that we need for this task:
Data Preparation
In the above section, I started the task of stock price prediction by importing the python libraries. Now I will write a function that will prepare the dataset so that we can fit it easily in the Linear Regression model:
You can easily understand the above function as I have narrated the functioning of every line step by step. Now the next thing to do is reading the data:
df = pd.read_csv("prices.csv") df = df[df.symbol == "GOOG"]
Now we need to prepare three input variables as already prepared in the function created in the above section. We need to declare an input variable mentioning about which column we want to predict. The next variable we need to declare is how much far we want to predict.
And the last variable that we need to declare is how much should be the size of the test set. Now let’s declare all the variables:
forecast_col = 'close' forecast_out = 5 test_size = 0.2
Applying Machine Learning for Stock Price Prediction
Now I will split the data and fit into the linear regression model:
X_train, X_test, Y_train, Y_test , X_lately =prepare_data(df,forecast_col,forecast_out,test_size); #calling the method were the cross validation and data preperation is in learner = LinearRegression() #initializing linear regression model learner.fit(X_train,Y_train) #training the linear regression model
Now let’s predict the output and have a look at the prices of the stock prices:
{‘test_score’: 0.9481024935723803, ‘forecast_set’: array([786.54352516, 788.13020371, 781.84159626, 779.65508615, 769.04187979])}
So this is how we can predict the stock prices with Machine Learning. I hope you liked this article on Stock Price prediction using Python with machine learning by implementing the Linear Regression Model. Feel free to ask your valuable questions in the comments section below.
can you tell me where is prices .csv file?
You can download the latest data from yahoo finance
Could you please share the sample Prices.CSV file or please share the navigation steps to download from yahoo finance.
https://thecleverprogrammer.com/2021/01/05/bitcoin-price-prediction-with-python/
here you will find all the steps to download data.
what should we on yahoo finance to get prices.csv dataset?
what should we search on yahoo finance to get prices.csv dataset?
Go to Yahoo finance and search for the company, then click on the historical data and then click on download
I am unable to find prices.csv file at company search. Could you please share a dataset link. I would be grateful. Thanks in advance
prices.csv is just the name of the file: https://query1.finance.yahoo.com/v7/finance/download/INR=X?period1=1580035828&period2=1611658228&interval=1d&events=history&includeAdjustedClose=true
Hi Aman, I have got the output but with a very different test score.
{‘test_score’: 0.639145178346672, ‘forecast_set’: array([73.37040254, 73.12634778, 73.16456803, 73.20017668, 73.1776807 ])}
Also can you tell why we are taking below given point as it is giving me error:
df = df[df.symbol == “GOOG”]
maybe you are using a new dataset
can you please tell me what is your input and output data column
Close column is the input variable, which indicates close prices
1.#calling the method were the cross validation and data preperation is in
X_train, X_test, Y_train, Y_test , X_lately =prepare_data(df,forecast_col,forecast_out,test_size)
learner = LinearRegression() #initializing linear regression model
learner.fit(X_train,Y_train) #training the linear regression model
ValueError: Found input variables with inconsistent numbers of samples: [246, 244] (i am getting this error when i ran above code⦠could you please solve for me
Check the dataset you are working with
hey thanks, it worked there were some null values worked after deleting it
@vbasheer how did you resolve the error could you please lete me know
these are my results
{‘test_score’: 0.9132328868016113, ‘forecast_set’: array([14733.28834587, 14678.01387179, 14455.85132032, 14320.59050617,
14044.5766888 ])}
i think test score is okay but i dont understand forecast set
Great
Hi am a beginner, want to know which tool to use? Spyder or Jupyter or Pycharm?
For any task where most of your work is related to analysis and visualization, you can use Jypyter notebook or Google Colab there. And for other tasks like GUI and logical problem solving you can use VS Code or any other IDE.
hey please tell me why we are taking below given point as it is giving me error:
df = df[df.symbol == āGOOGā]
I think you have not downloaded the csv file
What is the meaning of this line-
df = df[df.symbol == “GOOG”]
GOOG is the financial symbol of stock prices of Google
THE LINK THAT YOU SHARED https://query1.finance.yahoo.com/v7/finance/download/INR=X?period1=1580035828&period2=1611658228&interval=1d&events=history&includeAdjustedClose=true DOES NOT CONTAIN ATTRIBUTE ‘SYMBOL’
AttributeError: ‘DataFrame’ object has no attribute ‘symbol’
and without df=df[df.symbol=”GOOG”] it is is giving result as follow
‘test_score’: 0.6391451783466715, ‘forecast_set’: array([73.37040254, 73.12634778, 73.16456803, 73.20017668, 73.1776807 ])}
please follow this new article on Stock price prediction: https://thecleverprogrammer.com/2022/01/03/stock-price-prediction-with-lstm/