top | item 47169669 (no title) euclaise | 4 days ago Maybe RL? Just like similar corrections in reasoning traces. You can train non-'thinking' models the same way (though if you're naive about it then you might end up with responses that are similarly rambly), and I'd expect it to have been discuss order hn newest No comments yet.
No comments yet.