top | item 30711048

(no title)

tcskeptic | 4 years ago

The confidence is for the slope of the regression line. Are you objecting to something the interval is not trying to accomplish or am I missing something (entirely possible)?

discuss

jrd79|4 years ago

A one-dimensional affine fit (usually called a linear fit) contains two parameters: a slope and an offset. Both have error bounds, and the offset error bounds on this data would be huge. Data presentation that is not intended to deceive would have shown the vertical spread of the estimate too. But that spread would have been so wide that it would reveal that the fit is terrible and that reasonable conclusions cannot be drawn from these model fits. This is not scientific work. It is ideological policy advocacy dressed up as data science.

hervature|4 years ago

I think you are conflating two different things. The R^2 is incredibly poor for all the plots. That's essentially what you are complaining about. However, you can still have a very low R^2 but a statistically significant slope. While I agree that the article needs more support due to only doing univariate analysis and potentially missing huge confounders, your complaint is not valid. Here is code for basically random points that have a tight slope coefficient:

import numpy as np

import statsmodels.api as sm

n = 1000

desired_R2 = 0.05

mu = 0

sigma_noise = 0.1

sigma = np.sqrt(sigma_noise*2*(desired_R2/(1-desired_R2)))

X = np.random.normal(mu, sigma, n)

noise = np.random.normal(0, sigma_noise, n)

y = X + noise

X = sm.add_constant(X)

model = sm.OLS(y, X).fit()

model.summary()