(no title)
vohk | 6 months ago
There's a long history of that sort of behaviour. ISPs gaming bandwidth tests when they detect one is being run. Software recognizing being run in a VM or on a particular configuration. I don't think it's a stretch to assume some of the money at OpenAI and others has gone into spotting likely benchmark queries and throwing on a little more compute or tagging them for future training.
I would be outright shocked if most of these benchmarks are even attempting serious countermeasures.
No comments yet.