top | item 46819118

(no title)

therealpygon | 1 month ago

Context quite literally degrades performance of attention with size in non-needle-in-haystack lookups in almost every model to varying degrees. Thus to answer the question, the “waste” is making the model dumber unnecessarily in an attempt to make it smarter.

discuss

order

No comments yet.