top | item 43282707

(no title)

qin | 1 year ago

> you can see how different models interpret tool descriptions.

How's this done? I saw the creator of MCP recommended¹ "investing heavily in tool descriptions" but it wasn't clear exactly how to

¹ — https://x.com/dsp_/status/1897599702859645345

discuss

evalstate|1 year ago

The Messages API contains a special section for placing Tool information, which is added to the Context Window - and it's this information that the Model then uses to decide whether to attempt a Tool Call.

In that case, we configure the MCP Server, and then the Host application (in this case fast-agent) uses the Anthropic or OpenAI API to populate it, and they inject it in to the Context Window[1] in the format best for their model.

So for fast-agent, we can set the model when we define the agent with `model="o3-mini.medium"` or from a command line switch. Depending on the type of eval you are doing you could for example use a Parallel workflow to see how the different models perform. Quite often, given a failing tool call the model will attempt to recover (the @modelcontextprotocol/server-filesystem is... an interesting example).

Another fun one is to use Opus 3 tool calling, where it emits <thinking> tags showing how/why it's calling it.

One final point is that different combinations of tools will give different behaviours - if 2 MCP Servers have similar definitions, it will degrade performance... One of the motivations for fast-agent is precisely because it allows dividing tasks up amongst different context windows to get the sharpest performance.

Link to the Anthropic docs as it's my preferred explanation. The Messaging API's grab the JSON and present it as Tool Call types - other models will simply emit JSON and let the Client handle it.

[1] https://docs.anthropic.com/en/docs/build-with-claude/tool-us...