top | item 44082177 (no title) valine | 9 months ago That’s true yeah. The model can do that because calculating latents is independent of next token prediction. You do a forward pass for each token in your sequence without the final projection to logits. discuss order hn newest No comments yet.
No comments yet.