(no title)
miven | 1 year ago
There might me a lot of other hidden computations happening within the model's latents which may not immediately influence the predicted tokens but be relevant for the model's internal processing. And even disregarding that, the model is under no formal obligation to stick to the chain of thought it produced for its final decisions.
No comments yet.