Agreed. I was curious enough to run the model myself so I used a tool to extract the data. The slope estimate (b=17.24) is not significantly different from zero, p=.437.
In case anyone is interested, below is R code to read these data and compute the regression. The summary() reveals the p value for the slope to be 0.437, and that for the intercept to be 0.32.
d <- read.table("https://pastebin.com/raw/HhWTKZRb", header=TRUE)
m <- lm(cumulative_covid19_per100000~proportion_binge_drinkers, data=d)
summary(m)
The problem is that the author is essentially claiming that running the regression for data not passing his eyeball test is, in itself, a misuse of regression...which is nonsense.
I'm not sure I understand your point. Did you actually look at the regression line through the data? It looks crazy off. I'm not a statistician but that line looks like it doesn't represent that data very well at all. People area also saying nuanced comments above but the underlying fact seems to be that this is not a good use of linear regression, and there is no strong correlation between the two axes.
What are some examples of data sets with high(ish) r with high p (low confidence), and low p (high confidence) with low r?
I guess it would be a very tall, "sharp cornered" parallelogram of data points (clear slope at the average, but high error variation), vs a very short, wide rectangle?
bluenose69|5 years ago
SubiculumCode|5 years ago
gleenn|5 years ago
gowld|5 years ago
ivansavz|5 years ago
You mean you have a tool for extracting tabula data from a scatter plot like http://www.goodmath.org/blog/wp-content/uploads/2020/07/EcCq... ? That's very cool and I would love to hear more about it.
xioxox|5 years ago
gowld|5 years ago
I guess it would be a very tall, "sharp cornered" parallelogram of data points (clear slope at the average, but high error variation), vs a very short, wide rectangle?
That would be a cool explorable demo.