top | item 46557629

(no title)

p337 | 1 month ago

You end up wasting tokens on implementation, debugging, execution, and parsing when you could just use the tool (tool description gets used instead).

Also, once you give it this general access, it opens up essentially infinite directions for the model to go to. Repeatability and testing become very difficult in that situation. One time it may write a bash script to solve the problem. The next, it may want to use python, pip install a few libraries to solve that same problem. Yes, both are valid, but if you desire a particular flow, you need to create a prompt for it that you'll hope it'll comply with. It's about shifting certain decisions away from the model so that it can have more room for the stuff you need it to do while ensuring that performance is somewhat consistent.

For now, managing the context window still matters, even if you don't care about efficient token usage. So burning 5-10% on re-writing the same API calls makes the model dumber.

discuss

the_mitsuhiko|1 month ago

> You end up wasting tokens on implementation, debugging, execution, and parsing when you could just use the tool (tool description gets used instead).

The token are not wasted, because I rewind to before it started building the tool. That it can build and manipulate its own tools to me is the benefit, not the downside. The internal work to manipulate the tools does not waste any context because it's a side adventure that does not affect my context.

p337|1 month ago

Maybe I'm not understanding the scenario well. I'm imagining an autonomous agent as a sort of baseline. Are you saying the agent says "I need to write a tool", it takes a snapshot, and once it's done, it rewinds to the snapshot but this time, it has the tool it desired? That's actually a really cool idea to do autonomously!

If you mean manually, that's still interesting, but that kind of feels like the same thing to me. The idea is - don't let the agent burn context writing tools, it should just use them. Isn't that exactly what yours is doing? Instead of rewinding to a snapshot, I have a separate code base for it. As tools get more complex, it seems nice to have them well-tested with standardized input and output. Generating tools on the fly, rewinding, and using tools is just the same thing. You even would need to provide some context that says what the tool is and how to use it, which is basically what the mcp server is doing.