top | item 44082250 (no title) valine | 9 months ago You’re thinking about this like the final layer of the model is all that exists. It’s highly likely reasoning is happening at a lower layer, in a different latent space that can’t natively be projected into logits. discuss order hn newest No comments yet.
No comments yet.