
In this Article I will create a Linear Regression model and a Decision Tree Regression Model to Predict Google Stock Price using Machine Learning and Python.
Download the data set
Import pandas to import a CSV file:
import pandas as pd google = pd.read_csv("GOOG.csv") google.head()

To get the number of training days:
print("trainging days =",google.shape)
training days = (252, 5)
To Visualize the close price Data:
import matplotlib.pyplot as plt import seaborn as sns sns.set() plt.figure(figsize=(10, 4)) plt.title("Google's Stock Price") plt.xlabel("Days") plt.ylabel("Close Price USD ($)") plt.plot(google["Close Price"]) plt.show()

To get the close price:
google = google[["Close Price"]] print(google.head())
Close Price 0 1085.349976 1 1092.500000 2 1103.599976 3 1102.329956 4 1111.420044
Creating a variable to predict ‘X’ days in the future:
futureDays = 25
Create a new target column shifted ‘X’ units/days up:
futureDays = 25 # create a new target column shifted 'X' units/days up google["Prediction"] = google[["Close Price"]].shift(-futureDays) print(google.head()) print(google.tail())
Close Price Prediction 0 1085.349976 1138.069946 1 1092.500000 1146.209961 2 1103.599976 1137.810059 3 1102.329956 1132.119995 4 1111.420044 1250.410034 Close Price Prediction 247 1446.609985 NaN 248 1456.160034 NaN 249 1465.849976 NaN 250 1403.839966 NaN 251 1413.180054 NaN
To create a feature data set (x) and convert into a numpy array and remove last ‘x’ rows/days:
import numpy as np x = np.array(google.drop(["Prediction"], 1))[:-futureDays] print(x)
[[1085.349976] [1092.5 ] [1103.599976] [1102.329956] [1111.420044] [1121.880005] [1115.52002 ] [1086.349976] [1079.800049] [1076.01001 ] [1080.910034] [1097.949951] [1111.25 ] [1121.579956] [1131.589966] [1116.349976] [1124.829956] [1140.47998 ] [1144.209961] [1144.900024] [1150.339966] [1153.579956] [1146.349976] [1146.329956] [1130.099976] [1138.069946] [1146.209961] [1137.810059] [1132.119995] [1250.410034] [1239.410034] [1225.140015] [1216.680054] [1209.01001 ] [1193.98999 ] [1152.319946] [1169.949951] [1173.98999 ] [1204.800049] [1188.01001 ] [1174.709961] [1197.27002 ] [1164.290039] [1167.26001 ] [1177.599976] [1198.449951] [1182.689941] [1191.25 ] [1189.530029] [1151.290039] [1168.890015] [1167.839966] [1171.02002 ] [1192.849976] [1188.099976] [1168.390015] [1181.410034] [1211.380005] [1204.930054] [1204.410034] [1206. ] [1220.170044] [1234.25 ] [1239.560059] [1231.300049] [1229.150024] [1232.410034] [1238.709961] [1229.930054] [1234.030029] [1218.76001 ] [1246.52002 ] [1241.390015] [1225.089966] [1219. ] [1205.099976] [1176.630005] [1187.829956] [1209. ] [1207.680054] [1189.130005] [1202.310059] [1208.670044] [1215.449951] [1217.140015] [1243.01001 ] [1243.640015] [1253.069946] [1245.48999 ] [1246.150024] [1242.800049] [1259.130005] [1260.98999 ] [1265.130005] [1290. ] [1262.619995] [1261.290039] [1260.109985] [1273.73999 ] [1291.369995] [1292.030029] [1291.800049] [1308.859985] [1311.369995] [1299.189941] [1298.800049] [1298. ] [1311.459961] [1334.869995] [1320.699951] [1315.459961] [1303.050049] [1301.349976] [1295.339966] [1306.689941] [1313.550049] [1312.98999 ] [1304.959961] [1289.920044] [1295.280029] [1320.540039] [1328.130005] [1340.619995] [1343.560059] [1344.660034] [1345.02002 ] [1350.27002 ] [1347.829956] [1361.170044] [1355.119995] [1352.619995] [1356.040039] [1349.589966] [1348.839966] [1343.560059] [1360.400024] [1351.890015] [1336.140015] [1337.02002 ] [1367.369995] [1360.660034] [1394.209961] [1393.339966] [1404.319946] [1419.829956] [1429.72998 ] [1439.22998 ] [1430.880005] [1439.199951] [1451.699951] [1480.390015] [1484.400024] [1485.949951] [1486.650024] [1466.709961] [1433.900024] [1452.560059] [1458.630005] [1455.839966] [1434.22998 ] [1485.939941] [1447.069946] [1448.22998 ] [1476.22998 ] [1479.22998 ] [1508.680054] [1508.790039] [1518.27002 ] [1514.660034] [1520.73999 ] [1519.670044] [1526.689941] [1518.150024] [1485.109985] [1421.589966] [1388.449951] [1393.180054] [1318.089966] [1339.329956] [1389.109985] [1341.390015] [1386.52002 ] [1319.040039] [1298.410034] [1215.560059] [1280.390015] [1215.410034] [1114.910034] [1219.72998 ] [1084.329956] [1119.800049] [1096.800049] [1115.290039] [1072.319946] [1056.619995] [1134.459961] [1102.48999 ] [1161.75 ] [1110.709961] [1146.819946] [1162.810059] [1105.619995] [1120.839966] [1097.880005] [1186.920044] [1186.51001 ] [1210.280029] [1211.449951] [1217.560059] [1269.22998 ] [1262.469971] [1263.469971] [1283.25 ] [1266.609985] [1216.339966] [1263.209961] [1276.310059] [1279.310059] [1275.880005] [1233.670044] [1341.47998 ] [1348.660034] [1320.609985] [1326.800049] [1351.109985] [1347.300049] [1372.560059]]
To create a target dataset (y) and convert it to a numpy array and get all of the target values except the last ‘x’ rows days:
y = np.array(google["Prediction"])[:-futureDays] print(y)
[1138.069946 1146.209961 1137.810059 1132.119995 1250.410034 1239.410034 1225.140015 1216.680054 1209.01001 1193.98999 1152.319946 1169.949951 1173.98999 1204.800049 1188.01001 1174.709961 1197.27002 1164.290039 1167.26001 1177.599976 1198.449951 1182.689941 1191.25 1189.530029 1151.290039 1168.890015 1167.839966 1171.02002 1192.849976 1188.099976 1168.390015 1181.410034 1211.380005 1204.930054 1204.410034 1206. 1220.170044 1234.25 1239.560059 1231.300049 1229.150024 1232.410034 1238.709961 1229.930054 1234.030029 1218.76001 1246.52002 1241.390015 1225.089966 1219. 1205.099976 1176.630005 1187.829956 1209. 1207.680054 1189.130005 1202.310059 1208.670044 1215.449951 1217.140015 1243.01001 1243.640015 1253.069946 1245.48999 1246.150024 1242.800049 1259.130005 1260.98999 1265.130005 1290. 1262.619995 1261.290039 1260.109985 1273.73999 1291.369995 1292.030029 1291.800049 1308.859985 1311.369995 1299.189941 1298.800049 1298. 1311.459961 1334.869995 1320.699951 1315.459961 1303.050049 1301.349976 1295.339966 1306.689941 1313.550049 1312.98999 1304.959961 1289.920044 1295.280029 1320.540039 1328.130005 1340.619995 1343.560059 1344.660034 1345.02002 1350.27002 1347.829956 1361.170044 1355.119995 1352.619995 1356.040039 1349.589966 1348.839966 1343.560059 1360.400024 1351.890015 1336.140015 1337.02002 1367.369995 1360.660034 1394.209961 1393.339966 1404.319946 1419.829956 1429.72998 1439.22998 1430.880005 1439.199951 1451.699951 1480.390015 1484.400024 1485.949951 1486.650024 1466.709961 1433.900024 1452.560059 1458.630005 1455.839966 1434.22998 1485.939941 1447.069946 1448.22998 1476.22998 1479.22998 1508.680054 1508.790039 1518.27002 1514.660034 1520.73999 1519.670044 1526.689941 1518.150024 1485.109985 1421.589966 1388.449951 1393.180054 1318.089966 1339.329956 1389.109985 1341.390015 1386.52002 1319.040039 1298.410034 1215.560059 1280.390015 1215.410034 1114.910034 1219.72998 1084.329956 1119.800049 1096.800049 1115.290039 1072.319946 1056.619995 1134.459961 1102.48999 1161.75 1110.709961 1146.819946 1162.810059 1105.619995 1120.839966 1097.880005 1186.920044 1186.51001 1210.280029 1211.449951 1217.560059 1269.22998 1262.469971 1263.469971 1283.25 1266.609985 1216.339966 1263.209961 1276.310059 1279.310059 1275.880005 1233.670044 1341.47998 1348.660034 1320.609985 1326.800049 1351.109985 1347.300049 1372.560059 1388.369995 1403.26001 1375.73999 1349.329956 1356.130005 1373.189941 1383.939941 1373.484985 1406.719971 1402.800049 1410.420044 1417.02002 1417.839966 1416.72998 1428.920044 1431.819946 1439.219971 1436.380005 1412.180054 1438.390015 1446.609985 1456.160034 1465.849976 1403.839966 1413.180054]
To split the data into 75% tarining and 25% testing
from sklearn.model_selection import train_test_split xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.25)
Creating a decision tree regressor model
from sklearn.tree import DecisionTreeRegressor tree = DecisionTreeRegressor().fit(xtrain, ytrain)
Creating the Linear Regression model
from sklearn.linear_model import LinearRegression linear = LinearRegression().fit(xtrain, ytrain)
Getting the last ‘x’ rows/days of the feature data set
xfuture = google.drop(["Prediction"], 1)[:-futureDays] xfuture = xfuture.tail(futureDays) xfuture = np.array(xfuture) print(xfuture)
[[1120.839966] [1097.880005] [1186.920044] [1186.51001 ] [1210.280029] [1211.449951] [1217.560059] [1269.22998 ] [1262.469971] [1263.469971] [1283.25 ] [1266.609985] [1216.339966] [1263.209961] [1276.310059] [1279.310059] [1275.880005] [1233.670044] [1341.47998 ] [1348.660034] [1320.609985] [1326.800049] [1351.109985] [1347.300049] [1372.560059]]
To show the model tree prediction
treePrediction = tree.predict(xfuture) print("Decision Tree prediction =",treePrediction)
Decision Tree prediction = [1239.410034 1403.26001 1375.73999 1349.329956 1356.130005 1373.189941 1383.939941 1373.484985 1320.540039 1402.800049 1410.420044 1417.02002 1417.839966 1416.72998 1439.219971 1431.819946 1439.219971 1436.380005 1412.180054 1438.390015 1446.609985 1456.160034 1465.849976 1403.839966 1413.180054]
Visualize the Decision Tree prediction
predictions = treePrediction valid = google[x.shape[0]:] valid["Predictions"] = predictions plt.figure(figsize=(10, 6)) plt.title("Google's Stock Price Prediction Model(Decision Tree Regressor Model)") plt.xlabel("Days") plt.ylabel("Close Price USD ($)") plt.plot(google["Close Price"]) plt.plot(valid[["Close Price", "Predictions"]]) plt.legend(["Original", "Valid", "Predictions"]) plt.show()

To show the linear regression model prediction
linearPrediction = linear.predict(xfuture) print("Linear regression Prediction =",linearPrediction)
Linear regression Prediction = [1255.03253102 1248.43495371 1274.020735 1273.90291115 1280.73325977 1281.06943852 1282.8251867 1297.67261167 1295.73011351 1296.01746493 1301.70128427 1296.9197524 1282.47459125 1295.94275068 1299.7070824 1300.56913664 1299.58350577 1287.4544137 1318.43375148 1320.49695016 1312.43672886 1314.21545252 1321.20094705 1320.10615655 1327.36465619]
Visualize the Linear Model Prediction
predictions = linearPrediction valid = google[x.shape[0]:] valid["Predictions"] = predictions plt.figure(figsize=(10, 6)) plt.title("Google's Stock Price Prediction Model(Linear Regression Model)") plt.xlabel("Days") plt.ylabel("Close Price USD ($)") plt.plot(google["Close Price"]) plt.plot(valid[["Close Price", "Predictions"]]) plt.legend(["Original", "Valid", "Predictions"]) plt.show()

Hey! Thanks, how can I improve this model now to make it more accurate?
Hi Santosh, you can go through this article for further exploration – https://thecleverprogrammer.com/2020/05/25/algorithmic-trading-strategy-with-machine-learning-and-python/
Thanks! By the way, what is the difference between valid and prediction?
Valid means the accuracy of that data set that you have used to train your model. and Prediction is your output