Custom tool calling formats are iffy in my experience. The models are all reinforcement learned to follow specific ones, so it’s always a battle and feels to me like using the tool wrong.
Have you had good results with the other frontier models?
Not the parent commenter, but in my testing, all recent Claudes (4.5 onward) and the Gemini 3 series have been pretty much flawless in custom tool call formats.
thegeomaster|18 days ago
data-ottawa|18 days ago
I’ve tested local models from Qwen, GLM, and Devstral families.
pcwelder|17 days ago
GPT models can follow tool format correctly but don't keep on going.
Grok-4+ are decent but with issues in longer chats.
Kimi 2.5 has issues with it reverting to its RL tool format.