top | item 47106603

(no title)

Alright let's say I'm wrong about the details/nuances. That's still really not the point.

The point is this:

> we as end-users cannot with any certainty know if the model used python, or didn't

These tools can and do operate in ways opposite to their specific instructions all the time. I've had models make edits to files when I wasn't in agent mode (just chat mode). Chat mode is supposedly a sandboxed environment. So how does that happen? And I am sure we've all seen models plainly disregard an instruction for one reason or another.

The models, like any other software tool, have undocumented features.

You as an end-user cannot falsify the use of a python tool regardless of what the API docs say.

TLDR: Is this enough to falsify: NO

discuss

simianwords|7 days ago

If they used tools then why did fail in original paper?