What I don't understand is where is the memory? How does GPT-3 or ChatGPT remember so much information with just that architecture? It would seem that the maximum it could remember is 2048 words.
EDIT: Maybe it's 2048 x 96? Still seems low for what it can do.
Yes, but how does it remember the stuff you told it earlier in the conversation? Those 1.2TB is the trained model, and I assume that those weights are not changed by the conversation?
mjburgess|3 years ago
Epa095|3 years ago