top | item 30849321

(no title)

durovo | 3 years ago

I believe GPT-3 has a transformer-based architecture. So it doesn't recursively ingest it's own output in each iteration. I believe attention-based transformer models have enough complexity to be able to learn what you are talking about on their own.

discuss

ravi-delia|3 years ago

GPT-3's transformers only recur some finite amount. Attention does a lot compared to a bog standard RNN, and probably if the numbers were tokenized it would be enough for most reasonable computations, but eventually you definitely would hit a cap. That's probably a good thing, of course. The network and training are Turing complete together, but it would suck if the network itself could fail to terminate.

thrtythreeforty|3 years ago

Thank you for pointing out the difference. I went and reread about transformers; previously I thought they were a kind of RNN. (I am not an ML engineer.)