Stock Price Prediction with Machine Learning

In this Data Science Project we will create a Linear Regression model and a Decision Tree Regression Model to Predict Apple’s Stock Price using Machine Learning and Python.

Import pandas to import a CSV file:

import pandas as pd
apple = pd.read_csv("AAPL.csv")
print(apple.head())

To get the number of training days:

print("trainging days =",apple.shape)

#Output- training days = (251, 7)

To Visualize the close price Data:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.figure(figsize=(10, 4))
plt.title("Apple's Stock Price")
plt.xlabel("Days")
plt.ylabel("Close Price USD ($)")
plt.plot(apple["Close Price"])
plt.show()

To get the close price:

apple = apple[["Close Price"]]
print(apple.head())

Creating a variable to predict ‘X’ days in the future:

futureDays = 25

Create a new target column shifted ‘X’ units/days up:

apple["Prediction"] = apple[["Close Price"]].shift(-futureDays)
print(apple.head())
print(apple.tail())

To create a feature dataset (x) and convert into a numpy array and remove last ‘x’ rows/days:

import numpy as np
x = np.array(apple.drop(["Prediction"], 1))[:-futureDays]
print(x)

#Output-

[[185.720001]
[188.660004]
[190.919998]
[190.080002]
[189. ]
[183.089996]
[186.600006]
[182.779999]
[179.660004]
[178.970001]
[178.229996]
[177.380005]
[178.300003]
[175.070007]
[173.300003]
[179.639999]
[182.539993]
[185.220001]
[190.149994]
[192.580002]
[194.809998]
[194.190002]
[194.149994]
[192.740005]
[193.889999]
[198.449997]
[197.869995]
[199.460007]
[198.779999]
[198.580002]
[195.570007]
[199.800003]
[199.740005]
[197.919998]
[201.550003]
[202.729996]
[204.410004]
[204.229996]
[200.020004]
[201.240005]
[203.229996]
[201.75 ]
[203.300003]
[205.210007]
[204.5 ]
[203.350006]
[205.660004]
[202.589996]
[207.220001]
[208.839996]
[208.669998]
[207.020004]
[207.740005]
[209.679993]
[208.779999]
[213.039993]
[208.429993]
[204.020004]
[193.339996]
[197. ]
[199.039993]
[203.429993]
[200.990005]
[200.479996]
[208.970001]
[202.75 ]
[201.740005]
[206.5 ]
[210.350006]
[210.360001]
[212.639999]
[212.460007]
[202.639999]
[206.490005]
[204.160004]
[205.529999]
[209.009995]
[208.740005]
[205.699997]
[209.190002]
[213.279999]
[213.259995]
[214.169998]
[216.699997]
[223.589996]
[223.089996]
[218.75 ]
[219.899994]
[220.699997]
[222.770004]
[220.960007]
[217.729996]
[218.720001]
[217.679993]
[221.029999]
[219.889999]
[218.820007]
[223.970001]
[224.589996]
[218.960007]
[220.820007]
[227.009995]
[227.059998]
[224.399994]
[227.029999]
[230.089996]
[236.210007]
[235.869995]
[235.320007]
[234.369995]
[235.279999]
[236.410004]
[240.509995]
[239.960007]
[243.179993]
[243.580002]
[246.580002]
[249.050003]
[243.289993]
[243.259995]
[248.759995]
[255.820007]
[257.5 ]
[257.130005]
[257.23999 ]
[259.429993]
[260.140015]
[262.200012]
[261.959991]
[264.470001]
[262.640015]
[265.76001 ]
[267.100006]
[266.290009]
[263.190002]
[262.01001 ]
[261.779999]
[266.369995]
[264.290009]
[267.839996]
[267.25 ]
[264.160004]
[259.450012]
[261.73999 ]
[265.579987]
[270.709991]
[266.920013]
[268.480011]
[270.769989]
[271.459991]
[275.149994]
[279.859985]
[280.410004]
[279.73999 ]
[280.019989]
[279.440002]
[284. ]
[284.269989]
[289.910004]
[289.799988]
[291.519989]
[293.649994]
[300.350006]
[297.429993]
[299.799988]
[298.390015]
[303.190002]
[309.630005]
[310.329987]
[316.959991]
[312.679993]
[311.339996]
[315.23999 ]
[318.730011]
[316.570007]
[317.700012]
[319.230011]
[318.309998]
[308.950012]
[317.690002]
[324.339996]
[323.869995]
[309.51001 ]
[308.660004]
[318.850006]
[321.450012]
[325.209991]
[320.029999]
[321.549988]
[319.609985]
[327.200012]
[324.869995]
[324.950012]
[319. ]
[323.619995]
[320.299988]
[313.049988]
[298.179993]
[288.079987]
[292.649994]
[273.519989]
[273.359985]
[298.809998]
[289.320007]
[302.73999 ]
[292.920013]
[289.029999]
[266.170013]
[285.339996]
[275.429993]
[248.229996]
[277.970001]
[242.210007]
[252.860001]
[246.669998]
[244.779999]
[229.240005]
[224.369995]
[246.880005]
[245.520004]
[258.440002]
[247.740005]
[254.809998]
[254.289993]
[240.910004]
[244.929993]]

To create a target dataset (y) and convert it to a numpy array and get all of the target values except the last ‘x’ rows days:

y = np.array(apple["Prediction"])[:-futureDays]
print(y)

#Output-

[198.449997 197.869995 199.460007 198.779999 198.580002 195.570007
199.800003 199.740005 197.919998 201.550003 202.729996 204.410004
204.229996 200.020004 201.240005 203.229996 201.75 203.300003
205.210007 204.5 203.350006 205.660004 202.589996 207.220001
208.839996 208.669998 207.020004 207.740005 209.679993 208.779999
213.039993 208.429993 204.020004 193.339996 197. 199.039993
203.429993 200.990005 200.479996 208.970001 202.75 201.740005
206.5 210.350006 210.360001 212.639999 212.460007 202.639999
206.490005 204.160004 205.529999 209.009995 208.740005 205.699997
209.190002 213.279999 213.259995 214.169998 216.699997 223.589996
223.089996 218.75 219.899994 220.699997 222.770004 220.960007
217.729996 218.720001 217.679993 221.029999 219.889999 218.820007
223.970001 224.589996 218.960007 220.820007 227.009995 227.059998
224.399994 227.029999 230.089996 236.210007 235.869995 235.320007
234.369995 235.279999 236.410004 240.509995 239.960007 243.179993
243.580002 246.580002 249.050003 243.289993 243.259995 248.759995
255.820007 257.5 257.130005 257.23999 259.429993 260.140015
262.200012 261.959991 264.470001 262.640015 265.76001 267.100006
266.290009 263.190002 262.01001 261.779999 266.369995 264.290009
267.839996 267.25 264.160004 259.450012 261.73999 265.579987
270.709991 266.920013 268.480011 270.769989 271.459991 275.149994
279.859985 280.410004 279.73999 280.019989 279.440002 284.
284.269989 289.910004 289.799988 291.519989 293.649994 300.350006
297.429993 299.799988 298.390015 303.190002 309.630005 310.329987
316.959991 312.679993 311.339996 315.23999 318.730011 316.570007
317.700012 319.230011 318.309998 308.950012 317.690002 324.339996
323.869995 309.51001 308.660004 318.850006 321.450012 325.209991
320.029999 321.549988 319.609985 327.200012 324.869995 324.950012

  1. 323.619995 320.299988 313.049988 298.179993 288.079987
    292.649994 273.519989 273.359985 298.809998 289.320007 302.73999
    292.920013 289.029999 266.170013 285.339996 275.429993 248.229996
    277.970001 242.210007 252.860001 246.669998 244.779999 229.240005
    224.369995 246.880005 245.520004 258.440002 247.740005 254.809998
    254.289993 240.910004 244.929993 241.410004 262.470001 259.429993
    266.070007 267.98999 273.25 287.049988 284.429993 286.690002
    282.799988 276.929993 268.369995 276.100006 275.029999 282.970001
    283.170013 278.579987 287.730011 293.799988 289.070007 293.160004
    297.559998 300.630005 303.73999 310.130005]

Split the data into 75% training and 25% testing

from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.25)

Creating Models

# Creating the decision tree regressor model
from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor().fit(xtrain, ytrain)

# creating the Linear Regression model
from sklearn.linear_model import LinearRegression
linear = LinearRegression().fit(xtrain, ytrain)

To get the last ‘x’ rows/days of the feature dataset:

xfuture = apple.drop(["Prediction"], 1)[:-futureDays]
xfuture = xfuture.tail(futureDays)
xfuture = np.array(xfuture)
print(xfuture)

#Output-

[[273.359985]
[298.809998]
[289.320007]
[302.73999 ]
[292.920013]
[289.029999]
[266.170013]
[285.339996]
[275.429993]
[248.229996]
[277.970001]
[242.210007]
[252.860001]
[246.669998]
[244.779999]
[229.240005]
[224.369995]
[246.880005]
[245.520004]
[258.440002]
[247.740005]
[254.809998]
[254.289993]
[240.910004]
[244.929993]]

To see the model tree prediction

treePrediction = tree.predict(xfuture)
print("Decision Tree prediction =",treePrediction)

To see the model linear regression prediction

linearPrediction = linear.predict(xfuture)
print("Linear regression Prediction =",linearPrediction)

Visualize decision tree predictions

predictions = treePrediction
valid = apple[x.shape[0]:]
valid["Predictions"] = predictions
plt.figure(figsize=(10, 6))
plt.title("Apple's Stock Price Prediction Model(Decision Tree Regressor Model)")
plt.xlabel("Days")
plt.ylabel("Close Price USD ($)")
plt.plot(apple["Close Price"])
plt.plot(valid[["Close Price", "Predictions"]])
plt.legend(["Original", "Valid", "Predictions"])
plt.show()

Visualize the linear model predictions

predictions = linearPrediction
valid = apple[x.shape[0]:]
valid["Predictions"] = predictions
plt.figure(figsize=(10, 6))
plt.title("Apple's Stock Price Prediction Model(Linear Regression Model)")
plt.xlabel("Days")
plt.ylabel("Close Price USD ($)")
plt.plot(apple["Close Price"])
plt.plot(valid[["Close Price", "Predictions"]])
plt.legend(["Original", "Valid", "Predictions"])
plt.show()
Aman Kharwal
Aman Kharwal

Data Strategist at Statso. My aim is to decode data science for the real world in the most simple words.

Articles: 1607

17 Comments

  1. But it doesn’t really work. I have the same result as you but how can I predict with more accuracy the stock market

  2. Why can I only see the first plot and not the last plots? When i remove the first plot and run the code, i get an error that it expected a 2d array for y, but got a 1d array. Any ideas??

  3. Can you please tell me why you used x and y in train test split ..And why didnt you use other features in train test and split?

  4. Thanks for this great practice, I learned a lot! Quick question, what is the difference between the valid graph and the prediction graph (orange vs green)?

Leave a Reply

Discover more from thecleverprogrammer

Subscribe now to keep reading and get access to the full archive.

Continue reading