top | item 47192547

(no title)

[dead]

discuss

This is exactly right — and it's why acceptanceCriteria is a first-class field in the task schema, not just a description. Every task has an explicit acceptanceCriteria: string[] array that defines what "done" actually means:

acceptanceCriteria: [ "All tests pass (pnpm test)", "No TypeScript errors (pnpm tsc --noEmit)", "File written to src/components/NewFeature.tsx", "Completion report posted to inbox" ]

When a task launches, those criteria get injected into the agent's prompt context alongside the task description, subtasks, and agent instructions. The agent sees exactly what "done" means before it starts working.

You're also right that the deeper problem is "successfully completed the wrong thing." Retry logic assumes failure is obvious (exit code ≠ 0), but a task that silently drifts is harder to catch. The /ship-feature command enforces a verification step — runs tests, lints, and typechecks before marking anything complete — which catches a lot of the "it wrote code but nothing actually works" cases.

That said, there's still a gap between "tests pass" and "this actually does what I asked." That's where the human-in-the-loop decisions queue helps — agents can post a decision request like "I implemented X, but the acceptance criteria mention Y. Should I continue?" — but making agents reliably self-evaluate against criteria is still an open problem.