top | item 46708701

Show HN: Why single agents suck at math proofs

4 points| austinbaggio | 1 month ago |ensue.dev

Cursor published a great post last week on scaling long-running autonomous coding agents to build a browser.

We’ve been exploring a similar idea, but applied to math proofs

We show how a swarm of agents, coordinated via shared memory, can generate a Lean proof for a non-trivial multi-step math problem-Putnan A2-that a single agent struggled with. We were feeling fancy so we wrote a full post and captured the runtime.

Harness will be public soon(tm) for you to run it yourself.

1 comment

saidcooldude|1 month ago

i thought this work was fun for white box theorem proving. it is interesting to pass the structure to an llm to better understand the problem solving strategy.

using the tree shape as context for other questions was also interesting