top | item 46134548

(no title)

bigmadshoe | 2 months ago

Yeah the needle in a haystack tests are so stupid. It seems clear with LLMs that performance degrades massively with context size, yet those tests claim the model performs perfectly.

discuss

patates|2 months ago

As someone who abuses gemini regularly with a 90% full context, the model performance does degrade for sure but I wouldn't call it massively.

I can't show any evidence as I don't have such tests, but it's like coding normally vs coding after a beer or two.

For the massive effect, fill it 95% and we're talking vodka shots. 99%? A zombie who can code. But perhaps that's not fair when you have 1M token context size.

oceansweep|2 months ago

https://fiction.live/stories/Fiction-liveBench-Mar-25-2025/o...