top | item 42946141

(no title)

The trajectory of LLMs "routinely producing incorrect results" is heading downwards as we are getting more advanced reasoning models with test-time compute.

I don't know whether you used some of the more recent models like Claude 3.5 Sonnet and o1. But to me it is very clear where the trajectory is headed. o3 is just around the corner, and o4 is currently in training.

People found value even in a model like GPT 3.5 Turbo, and that thing was really bad. But hey, at least it could write some short scripts and boilerplate code.

You are also comparing mathematical computation - which has only 1 correct solution - with programming, where the solution space is much broader. There are multiple valid solutions. Some are more optimal than others. It is up to the human to evaluate that solution, as I've said in the post. Today, you may even need to fix the LLM's output. But in my experience, I'm finding I need to do this far less often than before.

discuss

No comments yet.