Yeah the needle in a haystack tests are so stupid. It seems clear with LLMs that performance degrades massively with context size, yet those tests claim the model performs perfectly.
As someone who abuses gemini regularly with a 90% full context, the model performance does degrade for sure but I wouldn't call it massively.
I can't show any evidence as I don't have such tests, but it's like coding normally vs coding after a beer or two.
For the massive effect, fill it 95% and we're talking vodka shots. 99%? A zombie who can code. But perhaps that's not fair when you have 1M token context size.
patates|2 months ago
I can't show any evidence as I don't have such tests, but it's like coding normally vs coding after a beer or two.
For the massive effect, fill it 95% and we're talking vodka shots. 99%? A zombie who can code. But perhaps that's not fair when you have 1M token context size.
oceansweep|2 months ago