top of page
shivamshinde92722

Striking the Right Balance: Understanding Underfitting and Overfitting in Machine Learning Models

This article will explain the basic concept of overfitting and underfitting from the machine learning and deep learning perspective.




Seeing underfitting and overfitting as a problem


Every person working on a machine learning problem wants their model to work as optimally as possible. But there are times when the model might not work as optimally as we want. It might either have an accuracy worse than ideal or better than ideal. In machine learning, both of these are considered a problem.


Some people might wonder that having a less-than-ideal accuracy might be considered a problem, but why are we considering the above ideal accuracy as a problem too?

Sometimes our model tries to find the relation in meaningless stuff i.e., some unnecessary features or some noise in the data, which is where this extra accuracy comes from. Let’s understand this with an example.


Sometimes our model tries to find the relation in meaningless stuff i.e., some unnecessary features or some noise in the data, which is where this extra accuracy comes from. Let’s understand this with an example.


If we are training a model that predicts a salary of a person. For this problem, our data have four features namely the name of the person, his/her education, his/her experience, and his/her skill set. Based on our common sense, we know that the person’s name is not a factor that affects the person’s salary. But despite this fact, if we use the person’s name as one of the features in our data, our model might try to find some kind of relation between name and salary. And this kind of relationship might add some extra accuracy to our model. This causes more-than-ideal accuracy and in such cases, our model is trained incorrectly.


Basic terminologies


Before diving into the topics, let’s understand two different kinds of errors that are necessary to understand underfitting and overfitting.


  1. Bias error: A bias error is basically an error that we find using the training data and a trained model. In other words, here we are finding the error using the same data that is used for training the model. An error can be any kind of error such as mean squared error, mean absolute error, etc.

  2. Variance error: A variance error is an error that we find using the test data and a trained model. Again here the error can be any type of error. Even though we can use any type of error to find the variance, we use the same error that we used for bias finding because that way we can compare the bias and variance values.

Note that the ideal condition of our trained model is having low bias and low variance.


What is overfitting and underfitting in general life?


Let’s say you are visiting a foreign country and the taxi driver rips you off. You might be tempted to say that all the taxi drivers in that country are greedy. This is what we call over-generalization.


The over-generalization could happen to our trained machine and deep learning models. The over-generalization in the case of machine and deep learning is known as the overfitting of the model.


Similarly, the under-generalization is known as the underfitting of the model.


What does overfitting mean from a machine learning perspective?


We say our model is suffering from overfitting if it has low bias and high variance.


Overfitting happens when the model is too complex relative to the amount and noisiness of the training data.


Possible solutions to the overfitting issue


  1. Simplify the model in one of the following ways:

Select the machine learning model with fewer parameters

Reduce the features or columns used for training the machine-learning model

Constraint the model (Using regularization methods)

  1. Gather more training data.

  2. Reduce the noise in the data. The noise could be some errors in the data or the presence of outliers, etc.

  3. Use early stopping

 

What is underfitting?


Underfitting happens when a machine learning model is not able to capture a relationship between our independent and dependent features. In other words, in case of underfitting, our model will give us high bias and high variance. There might be several reasons behind this.


Possible solutions to an underfitting issue


  1. Use a more complex model that could capture the relationship between independent and dependent features.

  2. Relax the constraints on the model i.e., reduce the regularization.

  3. Try to obtain more training data.

  4. Try to increase the duration of model training. This can be done by training the model for more epochs.

  5. Try to clean the data to reduce the noise.

 

Let’s see how the overfitting and underfitting look like using some plots


Let’s use the red-wine-quality dataset to understand the concepts of underfitting and overfitting.

Underfitting:



Observe the above plot. We can see that the accuracy of train model on both training data and test data is less than 55% which is quite less. So our model in this case is suffering from the underfitting problem. This is occurring because of the simplicity of the model.



After observing the above plot, one can tell that the space between the two graphs is increasing as we go towards the left side (i.e., as we increase epochs). This means as we are increasing the epochs for which training is performed, the training accuracy is increasing while test accuracy is not. This kind of situation is considered overfitting. This kind of model doesn’t generalize well on test as well as new data.


We need to train the model in such a way that it gives good enough accuracy on both the training data and test data. This model will be on the middle line between underfitting and overfitting.


 

I hope you like the article. If you have any thoughts on the article then please let me know. Any constructive feedback is highly appreciated.

Have a great day!



Comments


bottom of page