Predict Fuel Efficiency with Machine Learning

In these types of machine learning problems to predict fuel efficiency, we aim to predict the output of a continuous value, such as a price or a probability. In this article, I will take you through how we can predict Fuel Efficiency with Machine Learning.

Predict Fuel Efficiency

Here I will use one of the famous datasets among machine learning practitioners, Auto MPG dataset to create a model to predict fuel efficiency of vehicles in the late 1970s and early 1980s. To do this, we will provide the model with a description of many automobiles from this period. This description includes attributes such as cylinders, displacement, horsepower and weight.

Also, Read – Introduction to Robots with Python.

Let’s import the necessary libraries to get started with this task:

import matplotlib.pyplot as plt import pandas as pd import seaborn as sns import tensorflow as tf from tensorflow import keras from tensorflow.keras import layers

Now, the next thing to do is to download the dataset. You can easily download the dataset from here. Now, let’s import the data using the pandas package:

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight', 'Acceleration', 'Model Year', 'Origin'] dataset = pd.read_csv("auto.csv", names=column_names, na_values = "?", comment='\t', sep=" ", skipinitialspace=True)

The “origin” column in the dataset is categorical, so to move forward we need to use some one-hot encoding on it:

origin = dataset.pop('Origin') dataset['USA'] = (origin == 1)*1.0 dataset['Europe'] = (origin == 2)*1.0 dataset['Japan'] = (origin == 3)*1.0

Now, let’s split the data into training and test sets:

train_dataset = dataset.sample(frac=0.8,random_state=0) test_dataset = dataset.drop(train_dataset.index)

Before training and test to predict fuel efficiency with machine learning, let’s visualize the data by using the seaborn’s pair plot method:

sns.pairplot(train_dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")
image for post

Now, I will separate the target values from the features in the dataset. This label is that feature that I will use to train the model to predict fuel efficiency:

train_labels = train_dataset.pop('MPG') test_labels = test_dataset.pop('MPG')

Normalize The Data

It is recommended that you standardize features that use different scales and ranges. Although the model can converge without standardization of features, this makes learning more difficult and makes the resulting model dependent on the choice of units used in the input. We need to do this to project the test dataset into the same distribution the model was trained on:

def norm(x): return (x - train_stats['mean']) / train_stats['std'] normed_train_data = norm(train_dataset) normed_test_data = norm(test_dataset)

Note: The statistics used to normalize the inputs here (mean and standard deviation) should be applied to all other data provided to the model, with the one-hot encoding we did earlier. This includes the test set as well as live data when the model is used in production.

Build The Model

Let’s build our model. Here, I will use the sequential API with two hidden layers and one output layer that will return a single value. The steps to build the model are encapsulated in a function, build_model, since we will be creating a second model later:

def build_model(): model = keras.Sequential([ layers.Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]), layers.Dense(64, activation=tf.nn.relu), layers.Dense(1) ]) optimizer = tf.keras.optimizers.RMSprop(0.001) model.compile(loss='mean_squared_error', optimizer=optimizer, metrics=['mean_absolute_error', 'mean_squared_error']) return model model = build_model() model.summary()
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 64)                640       
_________________________________________________________________
dense_1 (Dense)              (None, 64)                4160      
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 65        
=================================================================
Total params: 4,865
Trainable params: 4,865
Non-trainable params: 0
_________________________________________________________________

Now, before training the model to predict fuel efficiency let’s tray this model in the first 10 samples:

example_batch = normed_train_data[:10] example_result = model.predict(example_batch) example_result
array([[-0.12670723],
       [-0.03443428],
       [ 0.3062502 ],
       [ 0.3065169 ],
       [ 0.36841604],
       [ 0.02191051],
       [ 0.3674207 ],
       [ 0.10561748],
       [ 0.00638346],
       [-0.00226454]], dtype=float32)

Training Model To Predict Fuel Efficiency

Now, let’s train the model to predict fuel efficiency:

class PrintDot(keras.callbacks.Callback): def on_epoch_end(self, epoch, logs): if epoch % 100 == 0: print('') print('.', end='') EPOCHS = 1000 history = model.fit( normed_train_data, train_labels, epochs=EPOCHS, validation_split = 0.2, verbose=0, callbacks=[PrintDot()])

Now, let’s visualize the model training:

def plot_history(history): hist = pd.DataFrame(history.history) hist['epoch'] = history.epoch plt.figure() plt.xlabel('Epoch') plt.ylabel('Mean Abs Error [MPG]') plt.plot(hist['epoch'], hist['mean_absolute_error'], label='Train Error') plt.plot(hist['epoch'], hist['val_mean_absolute_error'], label = 'Val Error') plt.ylim([0,5]) plt.legend() plt.figure() plt.xlabel('Epoch') plt.ylabel('Mean Square Error [$MPG^2$]') plt.plot(hist['epoch'], hist['mean_squared_error'], label='Train Error') plt.plot(hist['epoch'], hist['val_mean_squared_error'], label = 'Val Error') plt.ylim([0,20]) plt.legend() plt.show() plot_history(history)
Image for post

This graph below represents a little improvement or even degradation in validation error after about 100 epochs. Now, let’s update the model.fit method to stop training when the validation score does not improve. We’ll be using an EarlyStopping callback that tests a training condition for each epoch. If a set number of epochs pass without showing improvement, automatically stop training:

model = build_model() # The patience parameter is the amount of epochs to check for improvement early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10) history = model.fit(normed_train_data, train_labels, epochs=EPOCHS, validation_split = 0.2, verbose=0, callbacks=[early_stop, PrintDot()]) plot_history(history)
predict fuel efficiency

The graph shows that over the validation set, the average error is typically around +/- 2 MPG. Is it good? We will leave that decision to you.

Let’s see how the model generalizes using the test set, which we didn’t use when training the model. This shows how well it is expected that model to predict when we use it in the real world:

loss, mae, mse = model.evaluate(normed_test_data, test_labels, verbose=0) print("Testing set Mean Abs Error: {:5.2f} MPG".format(mae))
Testing set Mean Abs Error:  1.97 MPG

Now, let’s make predictions on the model to predict fuel efficiency:

test_predictions = model.predict(normed_test_data).flatten() plt.scatter(test_labels, test_predictions) plt.xlabel('True Values [MPG]') plt.ylabel('Predictions [MPG]') plt.axis('equal') plt.axis('square') plt.xlim([0,plt.xlim()[1]]) plt.ylim([0,plt.ylim()[1]]) _ = plt.plot([-100, 100], [-100, 100])
fuel efficiency

Also, Read – The Process of Data Science.

It looks like that the model predicted well. I hope you liked this article, to Predict Fuel Efficiency with Machine Learning. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to learn every topic of Machine Learning.

Follow Us:

Leave a Reply