(no title)
m_w_ | 3 months ago
> "The house that looks like a ripe tomato!"
that was transformed into a "user prompt" in a more instructional format
> "Go to the tomato house"
And both were used in the agent output. At least the Y-axes on the graphs look more reasonable than some other recent benchmarks.
No comments yet.