(no title)
pseudo_meta | 6 months ago
Upon some digging, it seems that part of the slowdown is due to the gpt-5 models by default doing some reasoning (reasoning effort "medium"), even for the nano or mini model. Setting the reasoning effort to "minimal" improves the speed a lot.
However, to be able to set the reasoning effort you have to switch to the new Response API, which wasn't a lot of work, but more than just changing a URL.
Tiberium|6 months ago
That's not true - you can switch reasoning effort in the Chat Completions API - https://platform.openai.com/docs/api-reference/chat/create . It's just that in Chat Completions API it's a parameter called "reasoning_effort", while in the Responses API it's a "reasoning" parameter (object) with a parameter "effort" inside.
pseudo_meta|6 months ago