In this article, you will learn intuitively R2 and Adjusted-R2 metrics work.
R2 is widely used as an evaluation metric for regression machine learning tasks. It finds out how much of the variance of the target feature (dependent feature) can be explained by the machine learning model (the model is basically a function of the independent features).
Now, you might be wondering what good will come from knowing the variance of the target feature. To answer this, we need to know how variance can be perceived as an information measurement tool. Basically, the higher the variance of a thing, the more information we have about that thing.
To understand this concept, let’s take one example. Let’s say we are playing a game where three of our friends have covered their faces and now, we need to recognize who is who based on their height only. If the difference in height (~ variance of heights) between the three friends is quite high, then it will be extremely easy to recognize all the friends. On the other hand, if friends have comparable heights, it will be quite challenging to recognize them based on height only. In this case, we need to look at some other criteria such as weight.
So, when we had a higher difference in height, we could easily recognize three friends. This example explains how variance can be conceived as an information measure.
R2 metric
R2 compares our trained model with the model that always outputs the mean of the data points (How good a yellow line is compared to the green line).
To find out the R2 metric, we need to know two values:
Variance of the target feature values around the mean of the data (average variance) i.e., the variance of gray dots with respect to the green line.
Variance of the target feature around the best-fit line (model variance) i.e., the variance of gray dots with respect to the yellow line.
The average variance can also be interpreted as the variance of the target feature explained by the model that outputs the mean of the data for every input. This can be explained by the horizontal line that cuts the y-axis at the mean of all the y coordinates of our data points (green line in the figure).
The model variance can be conceived as the variance of the target feature explained by our trained model for the given data (yellow line in the figure).
How to interpret R2
R2 value denotes the proportion of variance of the target feature that can be explained by your model. The more proportion of variance explained, the better your model is. So, an R2 value close to 1 corresponds to a good model, and a value close to refers to a bad model.
Let’s say the R2 value of our model is 0.85. This statement means that our trained model explains 85% of the variance in the target feature.
Possible values of R2
R2 exists between 0 and 1 (both inclusive). Sometimes, it can be negative also. This negative case can happen when we train a model on training data and then test the trained model on the new data. This is because it will not always be the case where the variance in the predictions on new data is less than the variance of the average model. We will always get a positive R2 value if we train on training data and test our model again on training data.
R2 = 0 => Trained model is equivalent to the average model (very poor performing model)
The maximum value of R2 is 1.
Problem with R2
R2 increases every time we add a new independent feature to the training data. This will occur even when we add some useless or random feature to the training data. This is because it is very easy to find a little correlation in random data also. But this small correlation might add up to making our model overfit. So, we need to have a performance measure that will not increase over such a small correlation. This problem is solved using another performance measure known as Adjusted-R2.
Adjusted R2 metric
The basic idea behind the adjusted R2 is to penalize the score as we keep adding new features to the model.
The denominator (n-m-1) decreases as we increase the value of m. So, if we don’t find a significant increase in R2, then the value of the whole expression will not increase or it may even decrease.
In short, The slight increase in the R2 value (due to unimportant feature addition) => adjusted R2 remains almost the same or may even decrease. The significant increase in R2 value (due to an important feature addition) => adjusted R2 increases significantly.
I hope of liked the article. Have a great day!
References:
Commentaires