(no title)
tedsanders | 24 days ago
The following are true:
- In our API, we don't change model weights or model behavior over time (e.g., by time of day, or weeks/months after release)
- Tiny caveats include: there is a bit of non-determinism in batched non-associative math that can vary by batch / hardware, bugs or API downtime can obviously change behavior, heavy load can slow down speeds, and this of course doesn't apply to the 'unpinned' models that are clearly supposed to change over time (e.g., xxx-latest). But we don't do any quantization or routing gimmicks that would change model weights.
- In ChatGPT and Codex CLI, model behavior can change over time (e.g., we might change a tool, update a system prompt, tweak default thinking time, run an A/B test, or ship other updates); we try to be transparent with our changelogs (listed below) but to be honest not every small change gets logged here. But even here we're not doing any gimmicks to cut quality by time of day or intentionally dumb down models after launch. Model behavior can change though, as can the product / prompt / harness.
ChatGPT release notes: https://help.openai.com/en/articles/6825453-chatgpt-release-...
Codex changelog: https://developers.openai.com/codex/changelog/
Codex CLI commit history: https://github.com/openai/codex/commits/main/
Trufa|24 days ago
I've had this perceived experience so many times, and while of course it's almost impossible to be objective about this, it just seem so in your face.
I don't discard being novelty plus getting used to it, plus psychological factors, do you have any takes on this?
jason_oster|24 days ago
Once the honeymoon wears off, the tool is the same, but you get less satisfaction from it.
Just a guess! Not trying to psychoanalyze anyone.
jychang|24 days ago
https://www.reddit.com/r/OpenAI/comments/1qv77lq/chatgpt_low...
tedsanders|24 days ago
The intention was purely making the product experience better, based on common feedback from people (including myself) that wait times were too long. Cost was not a goal here.
If you still want the higher reliability of longer thinking times, that option is not gone. You can manually select Extended (or Heavy, if you're a Pro user). It's the same as at launch (though we did inadvertently drop it last month and restored it yesterday after Tibor and others pointed it out).
tgrowazay|24 days ago
newswasboring|23 days ago
Maybe a dumb question but does this mean model quality may vary based on which hardware your request gets routed to?
qingcharles|23 days ago
I feel like you need to be making a bigger statement about this. If you go onto various parts of the Net (Reddit, the bird site etc) half the posts about AI are seemingly conspiracy theories that AI companies are watering down their products after release week.
ComplexSystems|24 days ago
tedsanders|23 days ago
That said, there are definitely cases where we intentionally trade off intelligence for greater efficiency. For example, we never made GPT-4.5 the default model in ChatGPT, even though it was an awesome model at writing and other tasks, because it was quite costly to serve and the juice wasn't worth the squeeze for the average person (no one wants to get rate limited after 10 messages). A second example: in our API, we intentionally serve dumber mini and nano models for developers who prioritize speed and cost. A third example: we recently reduced the default thinking times in ChatGPT to speed up the times that people were having to wait for answers, which in a sense is a bit of a nerf, though this decision was purely about listening to feedback to make ChatGPT better and had nothing to do with cost (and for the people who want longer thinking times, they can still manually select Extended/Heavy).
I'm not going to comment on the specific techniques used to make GPT-5 so much more efficient than GPT-4, but I will say that we don't do any gimmicks like nerfing by time of day or nerfing after launch. And when we do make newer models more efficient than older models, it mostly gets returned to people in the form of better speeds, rate limits, context windows, and new features.
jghn|24 days ago