What was your prompt to get it to run the test suite and heal tests at every step? I didn’t see that mentioned in your write up. Also, any specific reason you went with Codex over Claude Code?
For me (original author of JustHTML), it was enough the put the instructions on how to run tests in the AGENTS.md. It knows enough about coding to run tests by itself.
simonw|2 months ago
I used Codex for a few reasons:
1. Claude was down on Sunday when I kicked off tbis project
2. Claude Code is my daily driver and I didn't want to burn through my token allowance on an experiment
3. I wanted to see how well the new GPT-5.2 could handle a long running project
EmilStenstrom|2 months ago