top | item 44273348

(no title)

thinkling | 8 months ago

You're saying roughly "you can't trust the first answer from an LLM but if you run it through enough times, the results will converge on something good". This, plus all the hoo-hah about prompt engineering, seem like clear signals that the "AI" in LLMs is not actually very intelligent (yet). It confirms the criticism.

discuss

order

sysmax|8 months ago

Not exactly. Let's say, you-the-human are trying to fix a crash in the program knowing just the source location. You would look at the code and start hypothesizing:

* Maybe, it's because this pointer is garbage.

* Maybe, it's because that function doesn't work as the name suggests.

* HANG ON! This code doesn't check the input size, that's very fishy. It's probably the cause.

So, once you get that "Hang on" moment, here comes the boring part of of setting breakpoints, verifying values, rechecking observations and finally fixing that thing.

LLM's won't get the "hang on" part right, but once you point it right in their face, they will cut through the boring routine like no tomorrow. And, you can also spin 3 instances to investigate 3 hypotheses and give you some readings on a silver platter. But you-the-human need to be calling the shots.

majormajor|8 months ago

You can make a better tool by training the service (some of which involves training the model, some of which involves iterating on the prompt(s) behind the scene) to get a lot of the iteration out of the way. Instead of users having to fill in a detailed prompt we now have "reasoning" models which, as their first step, dump out a bunch of probably-relevant background info to try to push the next tokens in the right direction. A logical next step if enough people run into the OP's issue here is to have it run that "criticize this and adjust" loop internally.

But it all makes it very hard to tell how much of the underlying "intelligence" is improving vs how much of the human scaffolding around it is improving.

abraxas|8 months ago

Yeah given the stochastic nature of LLM outputs this approach and the whole field of prompt engineering feels like a classic case of cargo cult science.