top | item 47109805

(no title)

I don’t use plan.md docs either, but I recognise the underlying idea: you need a way to keep agent output constrained by reality.

My workflow is more like scaffold -> thin vertical slices -> machine-checkable semantics -> repeat.

Concrete example: I built and shipped a live ticketing system for my club (Kolibri Tickets). It’s not a toy: real payments (Stripe), email delivery, ticket verification at the door, frontend + backend, migrations, idempotency edges, etc. It’s running and taking money.

The reason this works with AI isn’t that the model “codes fast”. It’s that the workflow moves the bottleneck from “typing” to “verification”, and then engineers the verification loop:

  -keep the spine runnable early (end-to-end scaffold)

  -add one thin slice at a time (don’t let it touch 15 files speculatively)

  -force checkable artifacts (tests/fixtures/types/state-machine semantics where it matters)

  -treat refactors as normal, because the harness makes them safe

If you run it open-loop (prompt -> giant diff -> read/debug), you get the “illusion of velocity” people complain about. If you run it closed-loop (scaffold + constraints + verifiers), you can actually ship faster because you’re not paying the integration cost repeatedly.

Plan docs are one way to create shared state and prevent drift. A runnable scaffold + verification harness is another.

discuss

aitchnyu|9 days ago

Now that code is cheap, I ensured my side project has unit/integration tests (will enforce 100% coverage), Playwright tests, static typing (its in Python), scripts for all tasks. Will learn mutation testing too (yes, its overkill). Now my agent works upto 1 hour in loops and emits concise code I dont have to edit much.

EastLondonCoder|8 days ago

Totally get it, and I think we’re describing the same control loop from different angles.

Where I differ slightly is: “100% coverage” can turn into productivity theatre. It’s a metric that’s easy to optimize while missing the thing you actually care about: do we have machine-checkable invariants at the points where drift is expensive?

The harness that’s paid off for me (on a live payments system) is:

  - thin vertical slice first (end-to-end runnable, even if ugly)

  - tests at the seams (payments, emails, ticket verification / idempotency)

  - state-machine semantics where concurrency/ordering matters

  - unit tests as supporting beams, not wallpaper

Then refactors become routine, because the tests will make breakage explicit.

So yes: “code is cheap” -> increase verification. Just careful not to replace engineering judgement with an easily gamed proxy.