Thank you for the reply! Good luck with your app too! I heard of Claude Code filling out PRs but in my experience I haven't been able to successfully pull that off, as it creates errors and doesn't see it themselves. I am trying to experiment with a pipeline which it can find the feature it created by writing a frontend integration test and take a screenshot, using Playwright MCP, to verify whether it successfully or did not successfully execute the task. If it did not, then it loops until it does. This removes the human-in-the-loop and probably (I need to want internal evals to prove this out) increases its correctness per run. The bottleneck then becomes code review and making sure the code it did write isn't hot garbage.
No comments yet.