top | item 45863082

(no title)

ggeorgovassilis | 3 months ago

I came here just to complain about that :-) All LLMs I used seem to give more weight to things at the beginning of the context window and omit many details. Eg. I tried this simple thing: pasted a friend's and my CV into Gemini and asked it to recommend topics for a joint conference presentation. Results depended greatly on the order of CVs pasted in.

discuss

order

TheOtherHobbes|3 months ago

The middle tends to be underweighted. The beginning and end get more attention.

otabdeveloper4|3 months ago

That's because when they say "long context window" they're lying and they actually mean that they support a long input prompt that is still compressed into a small context window. (Typically by throwing out tokens in the middle.)

An actually large context window is impossible due to how LLM attention works under the hood.

acuozzo|3 months ago

Mamba-2 enters the chat.