You're saying roughly "you can't trust the first answer from an LLM but if you run it through enough times, the results will converge on something good". This, plus all the hoo-hah about prompt engineering, seem like clear signals that the "AI" in LLMs is not actually very intelligent (yet). It confirms the criticism.
sysmax|8 months ago
* Maybe, it's because this pointer is garbage.
* Maybe, it's because that function doesn't work as the name suggests.
* HANG ON! This code doesn't check the input size, that's very fishy. It's probably the cause.
So, once you get that "Hang on" moment, here comes the boring part of of setting breakpoints, verifying values, rechecking observations and finally fixing that thing.
LLM's won't get the "hang on" part right, but once you point it right in their face, they will cut through the boring routine like no tomorrow. And, you can also spin 3 instances to investigate 3 hypotheses and give you some readings on a silver platter. But you-the-human need to be calling the shots.
majormajor|8 months ago
But it all makes it very hard to tell how much of the underlying "intelligence" is improving vs how much of the human scaffolding around it is improving.
abraxas|8 months ago