Google Stock Price Prediction with Machine Learning

In this Article I will create a Linear Regression model and a Decision Tree Regression Model to Predict Google Stock Price using Machine Learning and Python.

Download the data set

Import pandas to import a CSV file:

import pandas as pd
google = pd.read_csv("GOOG.csv")
google.head()

To get the number of training days:

print("trainging days =",google.shape)
training days = (252, 5)

To Visualize the close price Data:

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.figure(figsize=(10, 4))
plt.title("Google's Stock Price")
plt.xlabel("Days")
plt.ylabel("Close Price USD ($)")
plt.plot(google["Close Price"])
plt.show()

To get the close price:

google = google[["Close Price"]]
print(google.head())
   Close Price
0  1085.349976
1  1092.500000
2  1103.599976
3  1102.329956
4  1111.420044

Creating a variable to predict ‘X’ days in the future:

futureDays = 25

Create a new target column shifted ‘X’ units/days up:

futureDays = 25
# create a new target column shifted 'X' units/days up
google["Prediction"] = google[["Close Price"]].shift(-futureDays)
print(google.head())
print(google.tail())
   Close Price   Prediction
0  1085.349976  1138.069946
1  1092.500000  1146.209961
2  1103.599976  1137.810059
3  1102.329956  1132.119995
4  1111.420044  1250.410034
     Close Price  Prediction
247  1446.609985         NaN
248  1456.160034         NaN
249  1465.849976         NaN
250  1403.839966         NaN
251  1413.180054         NaN

To create a feature data set (x) and convert into a numpy array and remove last ‘x’ rows/days:

import numpy as np
x = np.array(google.drop(["Prediction"], 1))[:-futureDays]
print(x)
[[1085.349976]
 [1092.5     ]
 [1103.599976]
 [1102.329956]
 [1111.420044]
 [1121.880005]
 [1115.52002 ]
 [1086.349976]
 [1079.800049]
 [1076.01001 ]
 [1080.910034]
 [1097.949951]
 [1111.25    ]
 [1121.579956]
 [1131.589966]
 [1116.349976]
 [1124.829956]
 [1140.47998 ]
 [1144.209961]
 [1144.900024]
 [1150.339966]
 [1153.579956]
 [1146.349976]
 [1146.329956]
 [1130.099976]
 [1138.069946]
 [1146.209961]
 [1137.810059]
 [1132.119995]
 [1250.410034]
 [1239.410034]
 [1225.140015]
 [1216.680054]
 [1209.01001 ]
 [1193.98999 ]
 [1152.319946]
 [1169.949951]
 [1173.98999 ]
 [1204.800049]
 [1188.01001 ]
 [1174.709961]
 [1197.27002 ]
 [1164.290039]
 [1167.26001 ]
 [1177.599976]
 [1198.449951]
 [1182.689941]
 [1191.25    ]
 [1189.530029]
 [1151.290039]
 [1168.890015]
 [1167.839966]
 [1171.02002 ]
 [1192.849976]
 [1188.099976]
 [1168.390015]
 [1181.410034]
 [1211.380005]
 [1204.930054]
 [1204.410034]
 [1206.      ]
 [1220.170044]
 [1234.25    ]
 [1239.560059]
 [1231.300049]
 [1229.150024]
 [1232.410034]
 [1238.709961]
 [1229.930054]
 [1234.030029]
 [1218.76001 ]
 [1246.52002 ]
 [1241.390015]
 [1225.089966]
 [1219.      ]
 [1205.099976]
 [1176.630005]
 [1187.829956]
 [1209.      ]
 [1207.680054]
 [1189.130005]
 [1202.310059]
 [1208.670044]
 [1215.449951]
 [1217.140015]
 [1243.01001 ]
 [1243.640015]
 [1253.069946]
 [1245.48999 ]
 [1246.150024]
 [1242.800049]
 [1259.130005]
 [1260.98999 ]
 [1265.130005]
 [1290.      ]
 [1262.619995]
 [1261.290039]
 [1260.109985]
 [1273.73999 ]
 [1291.369995]
 [1292.030029]
 [1291.800049]
 [1308.859985]
 [1311.369995]
 [1299.189941]
 [1298.800049]
 [1298.      ]
 [1311.459961]
 [1334.869995]
 [1320.699951]
 [1315.459961]
 [1303.050049]
 [1301.349976]
 [1295.339966]
 [1306.689941]
 [1313.550049]
 [1312.98999 ]
 [1304.959961]
 [1289.920044]
 [1295.280029]
 [1320.540039]
 [1328.130005]
 [1340.619995]
 [1343.560059]
 [1344.660034]
 [1345.02002 ]
 [1350.27002 ]
 [1347.829956]
 [1361.170044]
 [1355.119995]
 [1352.619995]
 [1356.040039]
 [1349.589966]
 [1348.839966]
 [1343.560059]
 [1360.400024]
 [1351.890015]
 [1336.140015]
 [1337.02002 ]
 [1367.369995]
 [1360.660034]
 [1394.209961]
 [1393.339966]
 [1404.319946]
 [1419.829956]
 [1429.72998 ]
 [1439.22998 ]
 [1430.880005]
 [1439.199951]
 [1451.699951]
 [1480.390015]
 [1484.400024]
 [1485.949951]
 [1486.650024]
 [1466.709961]
 [1433.900024]
 [1452.560059]
 [1458.630005]
 [1455.839966]
 [1434.22998 ]
 [1485.939941]
 [1447.069946]
 [1448.22998 ]
 [1476.22998 ]
 [1479.22998 ]
 [1508.680054]
 [1508.790039]
 [1518.27002 ]
 [1514.660034]
 [1520.73999 ]
 [1519.670044]
 [1526.689941]
 [1518.150024]
 [1485.109985]
 [1421.589966]
 [1388.449951]
 [1393.180054]
 [1318.089966]
 [1339.329956]
 [1389.109985]
 [1341.390015]
 [1386.52002 ]
 [1319.040039]
 [1298.410034]
 [1215.560059]
 [1280.390015]
 [1215.410034]
 [1114.910034]
 [1219.72998 ]
 [1084.329956]
 [1119.800049]
 [1096.800049]
 [1115.290039]
 [1072.319946]
 [1056.619995]
 [1134.459961]
 [1102.48999 ]
 [1161.75    ]
 [1110.709961]
 [1146.819946]
 [1162.810059]
 [1105.619995]
 [1120.839966]
 [1097.880005]
 [1186.920044]
 [1186.51001 ]
 [1210.280029]
 [1211.449951]
 [1217.560059]
 [1269.22998 ]
 [1262.469971]
 [1263.469971]
 [1283.25    ]
 [1266.609985]
 [1216.339966]
 [1263.209961]
 [1276.310059]
 [1279.310059]
 [1275.880005]
 [1233.670044]
 [1341.47998 ]
 [1348.660034]
 [1320.609985]
 [1326.800049]
 [1351.109985]
 [1347.300049]
 [1372.560059]]

To create a target dataset (y) and convert it to a numpy array and get all of the target values except the last ‘x’ rows days:

y = np.array(google["Prediction"])[:-futureDays]
print(y)
[1138.069946 1146.209961 1137.810059 1132.119995 1250.410034 1239.410034
 1225.140015 1216.680054 1209.01001  1193.98999  1152.319946 1169.949951
 1173.98999  1204.800049 1188.01001  1174.709961 1197.27002  1164.290039
 1167.26001  1177.599976 1198.449951 1182.689941 1191.25     1189.530029
 1151.290039 1168.890015 1167.839966 1171.02002  1192.849976 1188.099976
 1168.390015 1181.410034 1211.380005 1204.930054 1204.410034 1206.
 1220.170044 1234.25     1239.560059 1231.300049 1229.150024 1232.410034
 1238.709961 1229.930054 1234.030029 1218.76001  1246.52002  1241.390015
 1225.089966 1219.       1205.099976 1176.630005 1187.829956 1209.
 1207.680054 1189.130005 1202.310059 1208.670044 1215.449951 1217.140015
 1243.01001  1243.640015 1253.069946 1245.48999  1246.150024 1242.800049
 1259.130005 1260.98999  1265.130005 1290.       1262.619995 1261.290039
 1260.109985 1273.73999  1291.369995 1292.030029 1291.800049 1308.859985
 1311.369995 1299.189941 1298.800049 1298.       1311.459961 1334.869995
 1320.699951 1315.459961 1303.050049 1301.349976 1295.339966 1306.689941
 1313.550049 1312.98999  1304.959961 1289.920044 1295.280029 1320.540039
 1328.130005 1340.619995 1343.560059 1344.660034 1345.02002  1350.27002
 1347.829956 1361.170044 1355.119995 1352.619995 1356.040039 1349.589966
 1348.839966 1343.560059 1360.400024 1351.890015 1336.140015 1337.02002
 1367.369995 1360.660034 1394.209961 1393.339966 1404.319946 1419.829956
 1429.72998  1439.22998  1430.880005 1439.199951 1451.699951 1480.390015
 1484.400024 1485.949951 1486.650024 1466.709961 1433.900024 1452.560059
 1458.630005 1455.839966 1434.22998  1485.939941 1447.069946 1448.22998
 1476.22998  1479.22998  1508.680054 1508.790039 1518.27002  1514.660034
 1520.73999  1519.670044 1526.689941 1518.150024 1485.109985 1421.589966
 1388.449951 1393.180054 1318.089966 1339.329956 1389.109985 1341.390015
 1386.52002  1319.040039 1298.410034 1215.560059 1280.390015 1215.410034
 1114.910034 1219.72998  1084.329956 1119.800049 1096.800049 1115.290039
 1072.319946 1056.619995 1134.459961 1102.48999  1161.75     1110.709961
 1146.819946 1162.810059 1105.619995 1120.839966 1097.880005 1186.920044
 1186.51001  1210.280029 1211.449951 1217.560059 1269.22998  1262.469971
 1263.469971 1283.25     1266.609985 1216.339966 1263.209961 1276.310059
 1279.310059 1275.880005 1233.670044 1341.47998  1348.660034 1320.609985
 1326.800049 1351.109985 1347.300049 1372.560059 1388.369995 1403.26001
 1375.73999  1349.329956 1356.130005 1373.189941 1383.939941 1373.484985
 1406.719971 1402.800049 1410.420044 1417.02002  1417.839966 1416.72998
 1428.920044 1431.819946 1439.219971 1436.380005 1412.180054 1438.390015
 1446.609985 1456.160034 1465.849976 1403.839966 1413.180054]

To split the data into 75% tarining and 25% testing

from sklearn.model_selection import train_test_split
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.25)

Creating a decision tree regressor model

from sklearn.tree import DecisionTreeRegressor
tree = DecisionTreeRegressor().fit(xtrain, ytrain)

Creating the Linear Regression model

from sklearn.linear_model import LinearRegression
linear = LinearRegression().fit(xtrain, ytrain)

Getting the last ‘x’ rows/days of the feature data set

xfuture = google.drop(["Prediction"], 1)[:-futureDays]
xfuture = xfuture.tail(futureDays)
xfuture = np.array(xfuture)
print(xfuture)
[[1120.839966]
 [1097.880005]
 [1186.920044]
 [1186.51001 ]
 [1210.280029]
 [1211.449951]
 [1217.560059]
 [1269.22998 ]
 [1262.469971]
 [1263.469971]
 [1283.25    ]
 [1266.609985]
 [1216.339966]
 [1263.209961]
 [1276.310059]
 [1279.310059]
 [1275.880005]
 [1233.670044]
 [1341.47998 ]
 [1348.660034]
 [1320.609985]
 [1326.800049]
 [1351.109985]
 [1347.300049]
 [1372.560059]]

To show the model tree prediction

treePrediction = tree.predict(xfuture)
print("Decision Tree prediction =",treePrediction)
Decision Tree prediction = [1239.410034 1403.26001  1375.73999  1349.329956 1356.130005 1373.189941
 1383.939941 1373.484985 1320.540039 1402.800049 1410.420044 1417.02002
 1417.839966 1416.72998  1439.219971 1431.819946 1439.219971 1436.380005
 1412.180054 1438.390015 1446.609985 1456.160034 1465.849976 1403.839966
 1413.180054]

Visualize the Decision Tree prediction

predictions = treePrediction
valid = google[x.shape[0]:]
valid["Predictions"] = predictions
plt.figure(figsize=(10, 6))
plt.title("Google's Stock Price Prediction Model(Decision Tree Regressor Model)")
plt.xlabel("Days")
plt.ylabel("Close Price USD ($)")
plt.plot(google["Close Price"])
plt.plot(valid[["Close Price", "Predictions"]])
plt.legend(["Original", "Valid", "Predictions"])
plt.show()

To show the linear regression model prediction

linearPrediction = linear.predict(xfuture)
print("Linear regression Prediction =",linearPrediction)
Linear regression Prediction = [1255.03253102 1248.43495371 1274.020735   1273.90291115 1280.73325977
 1281.06943852 1282.8251867  1297.67261167 1295.73011351 1296.01746493
 1301.70128427 1296.9197524  1282.47459125 1295.94275068 1299.7070824
 1300.56913664 1299.58350577 1287.4544137  1318.43375148 1320.49695016
 1312.43672886 1314.21545252 1321.20094705 1320.10615655 1327.36465619]

Visualize the Linear Model Prediction

predictions = linearPrediction
valid = google[x.shape[0]:]
valid["Predictions"] = predictions
plt.figure(figsize=(10, 6))
plt.title("Google's Stock Price Prediction Model(Linear Regression Model)")
plt.xlabel("Days")
plt.ylabel("Close Price USD ($)")
plt.plot(google["Close Price"])
plt.plot(valid[["Close Price", "Predictions"]])
plt.legend(["Original", "Valid", "Predictions"])
plt.show()
Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1435

4 Comments

Leave a Reply