top | item 46678902

(no title)

I'm a believer that LLMs will keep getting better. But even today (which might or might not be "sufficient" training) they can easily run `rm -rf ~`.

Not that humans can't make these mistakes (in fact, I have nuked my home directory myself before), but I don't think it's a specific problem some guardrails can solve currently. I'm looking for innovations (either model-wise or engineering-wise) that'd do better than letting an agent run code until a goal is seemingly achieved.

discuss

No comments yet.