When a machine learning model performs very well on training data, but poorly on test data, it is referred to as overfitting. Likewise, if the model is performing poorly in both training and testing datasets, it is referred to as underfitting. In this article, I will introduce you to the concept of overfitting and underfitting in machine learning and how to avoid them.
Overfitting and Underfitting in Machine Learning
Overfitting:
Overfitting means the machine learning model performed very well on the training data but does not generalize well. This happens when the model is very complex compared to the amount and noise of the training dataset.
Here are some of the steps you can take to avoid overfitting:
- Simplify the machine learning model by selecting one of the fewer parameters by reducing the number of features in the training dataset or by constraining the model.
- Collect more training data and if you have a limited amount of data, increase the size of the training data.
- Remove outliers and explore your data further to correct more data errors.
Underfitting:
Underfitting is the opposite of overfitting. This usually happens when the machine learning model is very easy to learn from the underlying structure of the dataset, here are some of the possible steps you can take to avoid underfitting:
- Select a powerful model that has more parameters.
- Make sure you are using the most relevant features to train your model, otherwise spend more time on future engineering.
Summary
Hope you now understand the concepts of overfitting and underfitting in machine learning and how to avoid them. When you see perfect results in training data and poor results in test data, it is overfitting and if you see poor results in both training and test datasets then it is underfitting. Hope you enjoyed this article on the concepts of overfitting and underfitting in machine learning. Please feel free to ask your valuable questions in the comments section below.