(no title)
3s
|
5 months ago
This is really neat! Didn’t realize it could be this simple to run RL on models. Quick question: How would I specify the reward function for tool use? or is this something you automatically do for me when I specify the available tools and their uses?
ag8|5 months ago
The way tools currently work in the beta is you add tools via MCP to the configuration, and they get passed in as additional context for the model; the model might then choose to use a tool during inference; the tool is then automatically called and the output is returned as a tool message. If you really want to you could parse the tool output as part of reward calculation, but I expect you'd usually base the reward just on the model's completion. I could give more details if there's a specific tool setup you're envisioning!
-_-|5 months ago