(no title)
jakozaur | 2 months ago
It looks like the training data has plenty of those examples, but the models don’t have enough grounding or warnings before doing them. I wish there were a PleaseDontDoAnythingStupidEval for software engineering.
jakozaur | 2 months ago
It looks like the training data has plenty of those examples, but the models don’t have enough grounding or warnings before doing them. I wish there were a PleaseDontDoAnythingStupidEval for software engineering.
eugmill|2 months ago
Having said that, a PleaseDontDoAnythingStupidEval would probably slow down agentic coding quite a bit and make it less effective given how reliant agents are on making and recovering from mistakes. The solution is probably sandboxes and permission controls to not let them do something overly stupid, no different from an intern.