top | item 44682280

(no title)

nonhaver | 7 months ago

impressive evals. i wonder how much of that can be attributed to the enhanced context understanding. i feel like that/length are the bottleneck of the majority of commercial models.

discuss

Eisenstein|7 months ago

I don't know, I think that extending context windows is actually detrimental because people assume they can just dump things in there until it fills up. You still have to deal with the limited attention that the models have, and only filling the context with things relevant to the particular thing you are trying to solve is going to be the most effective approach. If you have too much information for it to fit into a 128K window, I think you just have too much information. The entirety of Don Quixote at over 1000 pages is less than 64,000 tokens.

CamperBob2|7 months ago

That sounds low by about 10x, assuming Don Quixote has 430k words (per Google).

Still, yes, I don't know of a single model that doesn't go off the rails if you actually try to take advantage of its context length specification.