Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

I noticed that that ‘r2_score’ and ‘explained_variance_score’ are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from ‘explained_variance_score’?

When would you choose one over the other?


OK, look at this example:

In [123]:
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
In [124]:
#what explained_variance_score really is
In [125]:
#what r^2 really is
In [126]:
#Notice that the mean residue is not 0
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residuesupposeto be 0?

Most of the answers I found (including here) emphasize on the difference betweenR2andExplained Variance Score, that is:The Mean Residue(i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


R2: is theCoefficient of Determinationwhich measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating thepredicted values ofylike this:

Varianceactual_y × R2actual_y = Variancepredicted_y

So intuitively, the more R2is closer to1, the more actual_y and predicted_y will havesamevariance (i.e. same spread)

As previously mentioned, the main difference is theMean of Error; and if we look at the formulas, we find that’s true:

R2 = 1 - [(Sum of Squared Residuals/n)/Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual)/Variancey_actual]

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error)/n

So, obviously the only difference is that we are subtracting theMean Errorfrom the first formula! …But Why?

When we compare theR2Scorewith theExplained Variance Score, we are basically checking theMean Error; so if R2= Explained Variance Score, that means: The Mean Error =Zero!

The Mean Error reflects the tendency of our estimator, that is: theBiased v.s Unbiased Estimation.


