This idea comes around every few months, but nobody can document it in tests (apart from the actually broken services that get fixed in a couple of days). Have you got repeatable cases where it can be shown?
I've had a 100% benchable case in the past, though it wasn't really a degradation in terms of output quality per se, it was (an undocumented and unacknowledged) permanent degradation in maximum output length. From one day to the next, 100% of prompts where we'd ask for e.g. 15 sections, would only do 10 sections and then ask "Would you like me to continue?". Which is in a way quality, but generally not something that shows up on coding benches and the likes. This was Anthropic.
Also seen a 2-week latency 10+ times increase of Gemini 2.5 Flash finetuned (= enterprise) model endpoints, again undocumented and unacknowledged, because they shifted all of their GPU capacity towards going "viral" on people generating slop artwork around Nano Banana Pro release.
So plenty of silent shenanigans do happen, including by the Big 3 on API endpoints. At the same time I agree with you that all those rumors of "degradation in code quality" are very much unproven.
viraptor|1 month ago
deaux|1 month ago
Also seen a 2-week latency 10+ times increase of Gemini 2.5 Flash finetuned (= enterprise) model endpoints, again undocumented and unacknowledged, because they shifted all of their GPU capacity towards going "viral" on people generating slop artwork around Nano Banana Pro release.
So plenty of silent shenanigans do happen, including by the Big 3 on API endpoints. At the same time I agree with you that all those rumors of "degradation in code quality" are very much unproven.