(no title)
valine | 9 months ago
But yeah, the LLM doesn’t even know the sampler exists. I used the last layer as an example, but it’s likely that reasoning traces exist in the latent space of every layer not just the final one, with the most complex reasoning concentrated in the middle layers.
jacob019|9 months ago
valine|9 months ago
Reasoning is definitely not happening in the linear projection to logits if that’s what you mean.
bcoates|9 months ago
valine|9 months ago
unknown|9 months ago
[deleted]