Here’s How Decision Tree Algorithm Works

In Machine Learning, Decision Tree is an algorithm that works like a flowchart or tree structure to make decisions based on input data. It starts with a question or condition and then follows different branches based on the answers to subsequent questions until a prediction is made. So, if you are new to machine learning and want to know how the decision tree algorithm works, this article is for you. In this article, I will introduce how the decision tree algorithm works and how to implement it using Python.

Here’s How Decision Tree Algorithm Works

In Machine Learning, Decision Tree is an algorithm used to solve problems that require making decisions based on multiple criteria or characteristics. Let’s understand how the decision tree algorithm works by taking an example of a real-time business problem.

Suppose you are a bank that wants to determine if a person is likely to be approved for a loan based on their credit score. You collected data on past loan applicants, including their credit scores and whether their loan applications were approved.

A decision tree algorithm is like a flowchart that can help the bank make decisions based on a person’s credit score. In this problem, the decision tree algorithm will start with a “root” node representing the first question, such as whether a person’s credit score is above or below a certain threshold.

Depending on the answer, the algorithm follows branches to subsequent questions, such as income level or employment status, until a “leaf” node is reached.

The leaf node represents a decision, such as predicting whether to approve or deny a loan based on input characteristics.

So, a decision tree algorithm is like a flowchart that makes decisions based on a series of questions and answers. It starts at a “root” node with a question, branches to subsequent questions based on the answers, and finally reaches a “leaf” node representing a decision.

Implementation of Decision Tree Algorithm using Python

Now let’s see how to implement the Decision Tree algorithm using Python. To implement it using Python, we can use the scikit-learn library in Python, which provides the functionality of implementing all Machine Learning algorithms and concepts using Python.

Let’s first import the necessary Python libraries and create a sample data based on the example we discussed above:

from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
import pandas as pd

credit_scores = [650, 720, 580, 800, 690, 750, 600, 670, 710, 680]
loan_approval = [1, 1, 0, 1, 0, 1, 0, 1, 1, 0] 

data = {"Credit_Score": credit_scores, "Loan_Approval": loan_approval}
df = pd.DataFrame(data)
print(df.head())
   Credit_Score  Loan_Approval
0           650              1
1           720              1
2           580              0
3           800              1
4           690              0

Now here’s how to train a Machine Learning model using the Decision Tree algorithm:

X = df[['Credit_Score']]
y = df['Loan_Approval']

clf = DecisionTreeClassifier(random_state=42)
clf.fit(X, y)

Below is how we can visualize the decision-making process of the decision tree algorithm:

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt

# Plot the decision tree
plt.figure(figsize=(10, 6))
plot_tree(clf, filled=True, rounded=True, 
          feature_names=['Credit Score'], 
          class_names=['Denied', 'Approved'])
plt.show()
decision tree algorithm visualization

Now here’s how we can make predictions using the Decision Tree algorithm:

user_credit_score = float(input("Enter your credit score: "))

prediction = clf.predict([[user_credit_score]])

if prediction[0] == 1:
    print("Congratulations! Your loan application is likely to be approved.")
else:
    print("We regret to inform you that your loan application is likely to be denied.")
Enter your credit score: 720
Congratulations! Your loan application is likely to be approved.

So this is how the Decision Tree algorithm works.

Advantages and Disadvantages of Decision Tree Algorithm

Here are some advantages and disadvantages of the Decision Tree algorithm that you should know:

Advantages:
  • Decision trees produce easy-to-interpret decision rules, allowing stakeholders to understand and trust the decision-making process.
  • Decision trees can effectively handle missing data by making decisions based on the data available for each division.
Disadvantages:
  • Decision trees are prone to overfitting, especially when the tree becomes too deep or complex.
  • Decision trees are sensitive to small changes in input data, which can result in different tree structures and decision rules.

Summary

So, a decision tree algorithm is like a flowchart that makes decisions based on a series of questions and answers. It starts at a “root” node with a question, branches to subsequent questions based on the answers, and finally reaches a “leaf” node representing a decision. I hope you liked this article on how the Decision Tree algorithm works and how to implement it using Python. Feel free to ask valuable questions in the comments section below.

Aman Kharwal
Aman Kharwal

I'm a writer and data scientist on a mission to educate others about the incredible power of data📈.

Articles: 1498

Leave a Reply