top | item 44077178 (no title) valine | 9 months ago Attention computes a weighted average of all previous latents. So yes, it’s a new token as input to the forward pass, but after it feeds through an attention head it contains a little bit of every previous latent. discuss order hn newest No comments yet.
No comments yet.