(no title)
minimaltom | 1 month ago
We force this residual stream to project to the logprobs of all tokens, just as a human in the act of speaking a sentence is forced to produce words. But could this residual stream represent thoughts which don't map to words?
Its plausible, we already have evidence that things like glitch-token representations trend towards the centroid of the high-dimensional latent space, and logprobs for tokens that represent wildly-branching trajectories in output space (i.e. "but" vs "exactly" for specific questions) represent a kind of cautious uncertainty.
No comments yet.