top | item 44082177

(no title)

valine | 9 months ago

That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits.

discuss

order

No comments yet.