top | item 44730167

(no title)

grej | 7 months ago

Related to this, is anyone aware whether there is a benchmark on this kind of thing - maybe broadly the category of “context rot”? To track things that are not germane to the current question adversely affecting the responses, as well as the volume of germane but deep context creating the inability of models to follow the conversation? I’ve definitely experienced the latter with coding models.

discuss

energy123|7 months ago

In computer vision they add noise to the picture when training. Maybe LLM providers should do the same during RL.

nijave|7 months ago

Not sure but sounds like a very similar problem to prompt injection