As any AI researcher knows, if you have a model that does 4x better than the naive baseline (the humans, in this case), you are likely looking at overfit, not real-life performance. This study is just slop, and you can tell so by the mere fact that they did not submit a paper, but just published a PR article.
LargoLasskhyfv|8 months ago
https://arxiv.org/abs/2506.22405
This appears when you click on 'View Publication' in the article near the end, right before Q&A.
brandonb|8 months ago
miraculixx|8 months ago