Good! DNNs unlock semantics (parsing, transforming, producing). That's the basis of general intelligence, not encyclopedic random string recall. Models shouldn't burn ungodly quantities of compute emulating DDR5 with their working memory. We need machines that think better, not memorize well. We already have plenty of those.
Massive context windows, and their needle tests, are misguided. We won't reach human-level AGI by basically inventing a natural language RDBMS. Our resources should primarily target better reasoning systems for our models, reinforcement learning, etc.
If we can build a GPT4-level problem solving system that coincidentally also can't remember telephone numbers, I'll consider it major progress.
Memorization usually refers to training data. It's often useful to have something that can utilize instructions losslessly, which is the distinction between these models.
What if your field of vision was infinite and you are looking at a unrolled telephone book?
Would you need a device to remember the phone number? You wouldn't. You would need a method or algorithm to find the number, but there is no reason why that algorithm couldn't be part of the attention mechanism. The attention mechanism is akin to reading the entire phone book for every word you are about to say. It would be unreasonable to expect you to not find the right phone number eventually.
a_wild_dandan|1 year ago
Massive context windows, and their needle tests, are misguided. We won't reach human-level AGI by basically inventing a natural language RDBMS. Our resources should primarily target better reasoning systems for our models, reinforcement learning, etc.
If we can build a GPT4-level problem solving system that coincidentally also can't remember telephone numbers, I'll consider it major progress.
6gvONxR4sf7o|1 year ago
Rodeoclash|1 year ago
orra|1 year ago
imtringued|1 year ago
Would you need a device to remember the phone number? You wouldn't. You would need a method or algorithm to find the number, but there is no reason why that algorithm couldn't be part of the attention mechanism. The attention mechanism is akin to reading the entire phone book for every word you are about to say. It would be unreasonable to expect you to not find the right phone number eventually.
Rodeoclash|1 year ago