I noticed that that ‘r2_score’ and ‘explained_variance_score’ are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from ‘explained_variance_score’?

When would you choose one over the other?

Thanks!

OK, look at this example:

```
In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015
```

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue**suppose**to be 0?

Most of the answers I found (including here) emphasize on the difference betweenR^{2}andExplained Variance Score, that is:**The Mean Residue**(i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?

**Refresher:**

R^{2}: is theCoefficient of Determinationwhich measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating thepredicted values of`y`

like this:

`Varianceactual_y × R2actual_y = Variancepredicted_y`

So intuitively, the more R^{2}is closer to`1`

, the more actual_y and predicted_y will havesamevariance (i.e. same spread)

As previously mentioned, the main difference is the**Mean of Error**; and if we look at the formulas, we find that’s true:

```
R2 = 1 - [(Sum of Squared Residuals/n)/Variancey_actual]
Explained Variance Score = 1 - [Variance(Ypredicted - Yactual)/Variancey_actual]
```

in which:

`Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error)/n`

So, obviously the only difference is that we are subtracting the**Mean Error**from the first formula! …**But Why?**

When we compare the**R ^{2}Score**with the

**Explained Variance Score**, we are basically checking the

**Mean Error**; so if R

^{2}= Explained Variance Score, that means: The Mean Error =

**Zero**!

The Mean Error reflects the tendency of our estimator, that is: the**Biased v.s Unbiased Estimation**.

### Similar Posts:

- The Usage of Numpy.unravel_index() function
- python-docx-template: Jinja’builtin_function_or_method’ object is not iterable
- Pychar yellow wavy line prompt: simplify chained comparison
- MySQL error message: subquery returns more than 1 row and its solution
- [design and development] Python learning notes – Super () argument 1 must be type, not classobj
- Note 32: yolov3: an incremental improvement
- curl HTTP Error 400. The request is badly formed.
- Solve Python MySQL insert int data error: typeerror% d format: a number is required, not str
- [How to Solve]error: invalid array assignment
- When using angularjs procedure, track by $index of NG repeat