(no title)
aresant | 6 months ago
eg - GPT-5 beats GPT-4 on factual recall + reasoning (HeadQA, Medbullets, MedCalc).
But then slips on structured queries (EHRSQL), fairness (RaceBias), evidence QA (PubMedQA).
Hallucination resistance better but only modestly.
Latency seems uneven (maybe more testing?) faster on long tasks, slower on short ones.
TrainedMonkey|6 months ago
narrator|6 months ago
I wonder if part of the degraded performance is where they think you're going into a dangerous area and they get more and more vague, for example like they demoed on launch day with the fireworks example. It gets very vague when talking about non-abusable prescription drugs for example. I wonder if that sort of nerfing gradient is affecting medical queries.
After seeing some painfully bad results, I'm currently using Grok4 for medical queries with a lot of success.
RestartKernel|6 months ago
JimDabell|6 months ago
Things like:
Me: Is this thing you claim documented? Where in the documentation does it say this?
GPT: Here’s a long-winded assertion that what I said before was correct, plus a link to an unofficial source that doesn’t back me up.
Me: That’s not official documentation and it doesn’t say what you claim. Find me the official word on the matter.
GPT: Exact same response, word-for-word.
Me: You are repeating yourself. Do not repeat what you said before. Here’s the official documentation: [link]. Find me the part where it says this. Do not consider any other source.
GPT: Exact same response, word-for-word.
Me: Here are some random words to test if you are listening to me: foo, bar, baz.
GPT: Exact same response, word-for-word.
It’s so repetitive I wonder if it’s an engineering fault, because it’s weird that the model would be so consistent in its responses regardless of the input. Once it gets stuck, it doesn’t matter what I enter, it just keeps saying the same thing over and over.
UltraSane|6 months ago
yieldcrv|6 months ago
Its impressive but a regression for now, in direct comparison to just high parameter model
woeirua|6 months ago
p1esk|6 months ago
fertrevino|6 months ago