top | item 47133516

(no title)

stevage | 7 days ago

>OpenAI's flagship model fails this 30% of the time. When it gets it right, the reasoning is concise: "You need the car at the car wash to wash it, so drive the short 50 meters." When it gets it wrong, it writes about fuel efficiency.

It's interesting to me how variable each model is. Many people talk about LLMs as if they were deterministic: "ChatGPT answers this question this way". Whereas clearly we should talk in more probabilistic terms.

discuss

No comments yet.