top | item 43469904

(no title)

hnben | 11 months ago

> if you assume all the computational pathways happen in parallel on a GPU, that doesn't necessarily increase the time the model spends thinking about the question

The layout of the NN is actually quite complex, which a large amount of information calculate beside the token-themselves, and the weights (think "latent vectors").

I recommend the 3b1b youtube-series on the topic.

discuss

order

No comments yet.