(no title)
miven | 1 year ago
I'm not sure whether it wouldn't be more reliable to let the model run on latents and try to train a separate latent-reading explainer module that has at least some approximation of what we want as an explicit optimization objective.
Assuming it actually is or has the potential to be better than CoT, from what I gathered from the paper the current results are mostly just more efficient token-wise.
No comments yet.