justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
The correcting factor n/(n-1) in R is what explains my paradox about the law of total variance Var(Y) = E(var(Y|X)) + Var(E(X|Y)), I was obtaining result that don't match this formula because I corrected all the variances with the factor 20/19 but the total variance should have the factor 40/39 just like you pointed.
Thanks for the comments and the correction.
I just added another comment that relates analysis of variance to this post to show that there is no real paradox here.
Finally, the formula for the total variance above is related to my intuition that having some information (having the data for each state) should make the means of the variances in each group smaller that the total variance, because variance is related to lack of information. But analysis of variance suggests (see other comment of mine) that the state factor is not representative because the high variance in each group (each state) and the low difference between the groups means and the total mean.
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
From (1) On the other hand, if the variation between the group means and the grand mean is small, and the variation within groups is large, this suggests there are no real differences in the group means i.e. the variations we observe is just sampling variation.
The above is in the context of analysis of variance. In our example the means in each state are 0.55 and 0.45 and the total mean is 0.50 so first summand is small but the variances in the red and blue states are both 0.247, large summand, so the variations we observe are just sampling variations. Hence the state factor is not important and that explains the low R^2 value. Note that in each state the predicted value for the model is the group mean of that group. So analysis of variance explains that the OP result is not a paradox or something strange.
https://saestatsteaching.tech/analysis-of-variance
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
Let d1 = data[state==0] and d2 = data[state==1], then var(d1$pref) = 0.26, var(d2$pref)= 0.26 and var(d$pref)= 0.256 (using R and one of your dataframes), so the intuition is that knowing the state does not give information about the preferences of the voters, so this suggests that any model based on state should give poor results and so having R^2=1 is not a big paradox in this case.
There must be a formula to compute R^2 from variances both among states and inside states but anyway, when the variances inside any state are bigger that the total variance that should imply that the feature that divides the population in groups is of little value for prediction so it should have a small R^2 value.
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
The math is correct, I am referring to your comment:
>>
R² is a measure like any other. In this case it measures the relative reduction in MSE - which is low because the prediction of individual votes remains quite bad even if the state is taken into account.
I may be reading too much from your comment, but it seems that you relate R^2 to the reduction in the prediction error in each state, so it seems you are thinking about the formula of computing the R^2 as the (average variance in each state)/(total variance), that I think is not correct in general since at least it should require the total variance to be the sum of the variances in each state. If you based your ideas in that formula then your intuition is not correct, that is my point. When I apply R^2 I am thinking in a multivariable linear model with continuous variables, and this is not the case. I should measure this problem by how the entropy change when we apply the information about the state, something like the cross entropy using the total distribution and the distribution by states.
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
The most general definition of R^2 can produce a result that is negative, and we are talking about a paradox related to values of R^2 that one should expect. So it is common to use linear models and linear regression. I don't know if the variance of the total population can be computed as the sum of the variances in each state, and state is not a continuous variable.
The population variance is the sum of the Between Group Variance and the Within Group Variance weighted by the number of elements in each group.
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
This is a little strange, you are using a data.frame with only two points so any linear model with two different parameters will be 100% accurate. This is the line that connect two points.
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
I think that you are using here a different definition of R^2 for example the way you are thinking of R^2 doesn't allow for an interpretation of the constant term used in the linear model for the formula of the R^2 to be true. What you are thinking is R^2 = 1 - mean(the variance in each state)/(total variance), but that is not the definition of R^2 for a linear model.
As the user fskfsk.... says in another comment, here the constant term explains a lot of the variance so that the slope terms contains less information, that is not available using your definition or idea of R^2
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
The math is correct, but I think the model used is not correct since it doesn't reflect that the variable s is dichotomous so rather a mixed model should be used. If we continue thinking that s is continuous we could think of this example: s=state is encoded as a continuous variable between -1 and 1 here people change state frequently and -1 reflects the person will vote in the blue state with probability 1 and s=1 that the person will vote in the red state with probability 1 while s=0 means that the person has the same probability of voting in the red or blue states. When s is near zero the model is not able to predict the preferences of the voter and this is the reason of the low predictive power of this model for a continuous s. The extreme cases s=-1 or s=1 could be rare for populations that move from one state to the other frequently so the initial intuition is misleaded to this paradox.
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
Sorry, I edited my post several times and finally choose a short form with links other sources. If you fix state=1 then there are no more random variables so the R^2 doesn't have any meaning. Just for fun, what the model should predict for state = 0.5?, that corresponds to a person that is 50% in the red state and 50% in the blue state, I think a mixed model is appropriated here when the state variable is discrete, so that each value of the state variable represents a different part of the population, the other model should be used when people move a lot and change frequently the state where they vote in, but in that case you should have to consider the fluctuations in the total population in each state at the time of voting.
justk
|
1 year ago
|
on: Well-known paradox of R-squared is still buggin me
The reason of the apparent paradox:
a) in this case the model is a mixed model
b) second the variable are nominal so you have to select one of the pseudo R^2 models.
For more information:
(1) Pseudo R-squared:
https://en.wikipedia.org/wiki/Pseudo-R-squared
(2) R squared for mixed models – the easy way
https://ecologyforacrowdedplanet.wordpress.com/2013/08/27/r-...c) The R^2 used with a linear model requires a constant term, in this case the constant term or bias explains a lot about preferences (almost 50/50) so there is less information available for the slope term.
Hope this helps.
I just added another comment that relates analysis of variance to this post to show that there is no real paradox here.
Finally, the formula for the total variance above is related to my intuition that having some information (having the data for each state) should make the means of the variances in each group smaller that the total variance, because variance is related to lack of information. But analysis of variance suggests (see other comment of mine) that the state factor is not representative because the high variance in each group (each state) and the low difference between the groups means and the total mean.