In Machine Learning, decision trees are very versatile supervised learning models that have the important property of being easy to interpret. In this article, I will explain what the Decision Trees Algorithm in Machine Learning is and how it works.
What are Decision Trees?
A decision tree is, as the name suggests, a binary tree data structure that is used to make a decision. Trees are a very intuitive way to display and analyze data and are commonly used even outside the realm of machine learning.
With the ability to predict both categorical values (classification trees) and actual values (regression trees) and to be able to take numeric and categorical data without any normalization or creation of dummy variables, it is not hard to see why they are popular choices for machine learning.
How do Decision Trees work?
Now I will walk you through how a typical decision tree algorithm works. The process of implementing a decision tree algorithm remains the same, but sometimes it can vary, depending on the task you are working on, which you will get to know after a lot of practice.
Starting at the root of the tree, the entire dataset is divided based on a binary condition into two child subsets. For example, if the condition is age ≥ 18, all data points for which this condition is true to go to the left child and all data points for which this condition is false to go to the right child.
The child subsets are then recursively partitioned into smaller subsets based on other conditions. The split conditions are automatically selected at each step based on the condition that best distributes all the elements.
Why Decision Trees are Used?
An important quality of decision trees is the relative ease of explaining the results of classification or regression since each prediction can be expressed in a series of Boolean conditions that trace a path from the root of the tree to a leaf node.
For example, if a decision tree model predicts that a sample of malware belongs to malware family A, we know this is because the binary was signed before 2015, does not connect to frame windows manager, make calls from multiple networks to Russian IP addresses, etc.
Since each sample crosses at most the height of the binary tree (O (log n) time complexity), decision trees are also effective for forming and making predictions. As a result, they work favourably for large data sets.
Decision trees often suffer from the problem of overfitting, where the trees are too complex and do not generalize well beyond the training set. Pruning is introduced as a regularization method to reduce the complexity of trees.
Decision trees tend to be less precise and robust than other supervised learning techniques. Small changes to the training dataset can result in large tree changes, which in turn result in changes to the model predictions. This means that decision trees (and most other related models) are not suitable for use in e-learning or incremental learning.
The greedy training of decision trees (as is almost always the case) does not guarantee an optimal decision tree because locally optimal and non-global optimal decisions are made at each split point. The formation of a globally optimal decision tree is an NP-complete problem.
So this was an introduction to decision trees about how the algorithm works and when to implement it. You can see the practical implementation of Decision Trees from here. I hope you liked this article, feel free to ask your valuable questions in the comments section below.