top | item 45936358

(no title)

canterburry | 3 months ago

I vibe coded for months but switched to spec driven development in the last 6 months

I'm also old enough to have started my career learning the rational unified process and then progressed through XP, agile, scrum etc

My process is I spend 2-3 hours writing a "spec" focusing on acceptance criteria and then by the end of the day I have a working, tested next version of a feature that I push to production.

I don't see how using a spec has made me less agile. My iteration takes 8 hours.

However, I see tons of useless specs. A spec is not a prompt. It's an actual definition of how to tell if something is behaving as intended or not.

People are notoriously bad at thinking about correctness in each scenario which is why vibe coding is so big.

People defer thinking about what correct and incorrect actually looks like for a whole wide scope of scenarios and instead choose to discover through trial and error.

I get 20x ROI on well defined, comprehensive, end to end acceptance tests that the AI can run. They fix everything from big picture functionality to minor logic errors.

discuss

order

mattmanser|3 months ago

Seems like you are all just redefining what spec and waterfall means.

A spec was from a customer where it would detail every feature. They would be huge, but usually lack enough detail or be ambiguous. They would be signed off by the customer and then you'd deliver to the spec.

It would contain months, if not years, worth of work. Then after all this work the end product would not meet the actual customer needs.

A day's work is not a spec. It's a ticket's worth of work, which is agile.

Agile is an iterative process where you deliver small chunks of work and the customer course corrects as regular intervals. Commonly 3/4 week sprints, made up of many tickets that take hours or days, per course correct.

Generally each sprint had a spec, and each ticket had a spec. But it sounds like until now you've just been winging it, with vague definitions per feature. It's very common, especially where the PO or PM are bad at their job. Or the developer is informally acting as PO.

Now you're making specs per ticket, you're just now doing what many development teams already do. You're just bizarrely calling it a new process.

It's like watching someone point at a bicycle and insist it's a rocketship.

hgomersall|3 months ago

A customer generally provides requirements (the system should do...) which are translated into a spec (the module/function/method should do...). The set of specs map to requirements. Requirements may be derived from or represented by user stories and specs may or may not by developed in an agile way or written down ahead of time. Whether you have or derive requirements and specs is entirely orthogonal to development methodology. People need to get away from the idea that having specs is any more than a formal description of what the code should do.

The approach we take is the specs are developed from the tests and tests exercise the spec point in its entirety. That is, a test and a spec are semantically synonymous within the code base. Any interesting thing we're playing with is using the specs alongside the signatures to have an LLM determine when the spec is incomplete.

pipes|3 months ago

I'll probably be proven wrong eventually, but my main thought about spec driven dev with llms is that it introduces an unreliable compiler. It will produced different results every time it is run and it's up to the developer to review the changes. Which just seems like a laborious error prone task.

mexicocitinluez|3 months ago

You don't need this type of work to be deterministic. It doesn't really matter if the LLM names a function "IsEven" vs "IsNumberEvent".

Have you ever written the EXACT same code twice?

> it introduces an unreliable compiler.

So then by definition so our humans. If compiling is "taking text and converting it to code" that's literally us.

> it's up to the developer to review the changes. Which just seems like a laborious error prone task.

There are trade-offs to everything. Have you ever worked with an off-shore team? They tend to produce worse code and have 1% of the context the LLM does. I'd much rather review LLM-written code than "I'm not even the person you hired because we're scamming the system" developers.

Kiro|3 months ago

Why would you want to rerun it? In that context a human is also an unreliable compiler. Put two humans on the task and you will get two different results. Even putting the same human on the same task again will yield something different. LLMs producing unreliable output that can't be reproduced is definitely a problem but not in this case.

CuriouslyC|3 months ago

No, this is the right take. Spec driven development is good, but having loose markdown "specs" that leave a bunch up to the discretion of the LLM is bad. The right approach is a project spec DSL that agents write, which can be compiled via codegen in a more controlled way.

dakinitribe|3 months ago

Could I see one of your specs as an example?

spacecadet|3 months ago

Same. I fancy myself a decent technical communicator and architect. I write specs which consists of giant lists of acceptance criteria, on my phone, laying in bed...

Kick that over to some agents to bash on, check in and review here and there, maybe a little mix of vibe and careful corrections by me, and it's done!

Usually in less time, but! any time an agent is working on work shit, Im working on my race car... so its a win win win to me. Im still using my brain, no longer slogging through awful "human centered" programming languages, more time my hobbies.

Isn't that the dream?

Now, to crack this research around generative gibber-lang programming... 90% of our generative code problems are related to the programming languages themselves. Intended for humans, optimized for human interaction, speed, and parsing. Let the AIs design, speak, write, and run the code. All I care about is that the program passes my tests and does what I intended. I do not care if it has indents, or other stupid dogmatic aspects of what makes one language equally usable to any other, but no "my programming language is better!", who cares. Loving this era.

noosphr|3 months ago

    People defer thinking about what correct and incorrect actually
    looks like for a whole wide scope of scenarios and instead choose
    to discover through trial and error.
LLMs are _still_ terrible at deriving even the simplest of logical entailment. I've had the latest and greatest Claude and GPT derive 'B instead of '(not B) from '(and A (not B)) when 'A and 'B are anything but the simplest of English sentences.

I shudder to think what they decide the correct interpretations of a spec written in prose is.

Kiro|3 months ago

I would love to see a prompt where it fails such a thing. Do you have an example?

layer8|3 months ago

Lisp quotes are confusing in prose.

0x696C6961|3 months ago

Still better than my coworkers ...