top | item 39855518

(no title)

NLPaep | 1 year ago

Mamba is bad with long context. It doesn't remember phone numbers

https://www.harvard.edu/kempner-institute/2024/02/05/repeat-...

discuss

Good! DNNs unlock semantics (parsing, transforming, producing). That's the basis of general intelligence, not encyclopedic random string recall. Models shouldn't burn ungodly quantities of compute emulating DDR5 with their working memory. We need machines that think better, not memorize well. We already have plenty of those.

Massive context windows, and their needle tests, are misguided. We won't reach human-level AGI by basically inventing a natural language RDBMS. Our resources should primarily target better reasoning systems for our models, reinforcement learning, etc.

If we can build a GPT4-level problem solving system that coincidentally also can't remember telephone numbers, I'll consider it major progress.

6gvONxR4sf7o|1 year ago

Memorization usually refers to training data. It's often useful to have something that can utilize instructions losslessly, which is the distinction between these models.

Rodeoclash|1 year ago

I can't remember phone numbers either but I can use a device suited to remembering them to look them up

orra|1 year ago

Hell, it looks like you forgot you already said that (-:

imtringued|1 year ago

What if your field of vision was infinite and you are looking at a unrolled telephone book?

Would you need a device to remember the phone number? You wouldn't. You would need a method or algorithm to find the number, but there is no reason why that algorithm couldn't be part of the attention mechanism. The attention mechanism is akin to reading the entire phone book for every word you are about to say. It would be unreasonable to expect you to not find the right phone number eventually.

Rodeoclash|1 year ago

I can't remember phone numbers either but I can use a device suited to remembering them to look them up.