Categories

# Voting Classifier in Machine Learning

Suppose you have trained a lot of classification models, and your each model is achieving the accuracy of 85 percent. A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes. This majority-vote classification is known as a voting classifier.

In this article, I will take you through the voting classifier in Machine Learning. I will first start with importing the necessary libraries:

```import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)```

## Example of aVoting Classifier

Now, let’s move forward to create our voting classifier:

```heads_proba = 0.51
coin_tosses = (np.random.rand(10000, 10) < heads_proba).astype(np.int32)
cumulative_heads_ratio = np.cumsum(coin_tosses, axis=0) / np.arange(1, 10001).reshape(-1, 1)
plt.figure(figsize=(8,3.5))
plt.plot([0, 10000], [0.51, 0.51], "k--", linewidth=2, label="51%")
plt.plot([0, 10000], [0.5, 0.5], "k-", label="50%")
plt.xlabel("Number of coin tosses")
plt.legend(loc="lower right")
plt.axis([0, 10000, 0.42, 0.58])
save_fig("law_of_large_numbers_plot")
plt.show()```

The above output shows if you are tossing a slightly biased coin that is having 51 per cent chances of showing heads and 49 per cent chances of showing tails.

If you toss such a coin, you will get heads 510 times and tails 490 times similarly if you have an ensemble with 1000 classification models that shows accuracy only 51 per cent of times. If you predict the classes with such models, you can receive up to 75 per cent accuracy rate.

## Training a Voting Classifier

The methods of voting classifier work best when the predictions are independent of each other—the only way to diversify the classification models to train them using different algorithms.

Also, Read: Scraping Instagram with Python.

Now let’s create and train a voting classifier in Machine Learning using Scikit-Learn, which will include three classification models.

```from sklearn.model_selection import train_test_split
from sklearn.datasets import make_moons

X, y = make_moons(n_samples=500, noise=0.30, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)```
```from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

log_clf = LogisticRegression(solver="lbfgs", random_state=42)
rnd_clf = RandomForestClassifier(n_estimators=100, random_state=42)
svm_clf = SVC(gamma="scale", random_state=42)

voting_clf = VotingClassifier(
estimators=[('lr', log_clf), ('rf', rnd_clf), ('svc', svm_clf)],
voting='hard')
voting_clf.fit(X_train, y_train)```
```VotingClassifier(estimators=[('lr',
LogisticRegression(C=1.0, class_weight=None,
dual=False, fit_intercept=True,
intercept_scaling=1,
l1_ratio=None, max_iter=100,
multi_class='auto',
n_jobs=None, penalty='l2',
random_state=42,
solver='lbfgs', tol=0.0001,
verbose=0, warm_start=False)),
('rf',
RandomForestClassifier(bootstrap=True,
ccp_alpha=0.0,
class_weight=None,
crit...
oob_score=False,
random_state=42, verbose=0,
warm_start=False)),
('svc',
SVC(C=1.0, break_ties=False, cache_size=200,
class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3,
gamma='scale', kernel='rbf', max_iter=-1,
probability=False, random_state=42,
shrinking=True, tol=0.001, verbose=False))],
flatten_transform=True, n_jobs=None, voting='hard',
weights=None)```
```from sklearn.metrics import accuracy_score

for clf in (log_clf, rnd_clf, svm_clf, voting_clf):
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
print(clf.__class__.__name__, accuracy_score(y_test, y_pred))```
```LogisticRegression 0.864
RandomForestClassifier 0.896
SVC 0.896
VotingClassifier 0.912```

In the output, we can see that all the classification models performed with an accuracy rate of more than 85 per cent, and the voting classification model which used the predictions of all the three models gave us an accuracy of over 90 per cent.

So this is how we can use a voting classification model in Machine Learning classification models. I hope you liked this article. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to read more amazing articles.