top | item 47133796

(no title)

Because that's where the compute happens, in those "verbose" tokens. A transformer has a size, it can only do so many math operations in one pass. If your problem is hard, you need more passes.

Asking it to be shorter is like doing fewer iteration of numerical integral solving algorithm.

discuss

sambaumann|5 days ago

Yeah, but all the models live in chatGPT have reasoning (iirc) - they could use reasoning tokens to do the 'compute', and still show the user a succinct response that directly answers the query