(no title)
didericis | 4 months ago
Inferring intent from plain english prompts and context is a powerful way for computers to guess what you want from underspecified requirements, but the problem of defining what you want specifically always requires you to convey some irreducible amount of information. Whether it’s code, highly specific plain english, or detailed tests, if you care about correctness they all basically converge to the same thing and the same amount of work.
crazygringo|4 months ago
That's the part I'd push back on. They're not the same amount of work.
When I'm writing the code myself, it's basically a ton of "plumbing" of loops and ifs and keeping track of counters and making sure I'm not making off-by-one errors and not making punctuation mistakes and all the rest. It actually takes quite a lot of brain energy and time to get that all perfect.
It saves a lot of time to write the function definition in plain English, have the LLM generate a bunch of tests that you verify are the correct definition... and then let the LLM take care of all the loops and indexing and punctuation and plumbing.
I regularly cut what used to be an entire afternoon or day's worth of work down into 30 minutes. I spend 10 minutes writing the design for what will be 500-1,000 lines of code, 5 minutes answering the LLM's questions about it, 5 minutes skimming the code to make sure it all looks vaguely plausible (no obvious red flags), 5 minutes ensuring the unit tests cover everything I can think of (almost always, the LLM has thought of a bunch of edge cases I never would have bothered to test), and another 5 minutes telling it to fix things, like its unit tests make me suddenly realize there's an edge case that should be defined differently.
The idea that it's the "same amount of work" is crazy to me. It's so much more efficient. And in all honesty, the code is more reliable too because it tests things that I usually wouldn't bother with, because writing all the tests is so boring.
didericis|4 months ago
All of that "plumbing" affects behavior. My argument is that all of the brain energy used when checking that behavior is necessary in order to check that behavior. Do you have a test for an off by one error? Do you have a test to make sure your counter behaves correctly when there are multiple components on the same page? Do you have a test to make sure errors don't cause the component to crash? Do you have a test to ensure non utf-8 text or binary data in a text input throws a validation error? Etc etc. If you're checking all the details for correct behavior, the effort involved converges to roughly the same thing.
If you're not checking all of that plumbing, you don't know whether or not the behavior is correct. And the level of abstraction used when working with agents and LLMs is not the same as when working with a higher level language, because LLMs make no guarantees about the correspondence between input and output. Compilers and programming languages are meticulously designed to ensure that output is exactly what is specified. There are bugs and edge cases in compilers and quirks based on different hardware, so it's not always 100% perfect, but it's 99.9999% perfect.
When you use an LLM, you have no guarantees about what it's doing, and in a way that's categorically different than not knowing what a compiler does. Very few people know all of the steps that break down `console.log("hello world")` into the electrical signals that get sent to the pixels on a screen on a modern OS using modern hardware given the complexity of the stack, but they do know with as close as is humanly possible to 100% certainty that a correctly configured environment will result in that statement outputting the text "hello world" to a console. They do not need to know the implementation because the contract is deterministic and well defined. Prompts are not deterministic nor well defined, so if you want to verify it's doing what you want it to do, you have to check what it's doing in detail.
Your basic argument here is that you can save a lot of time by trusting the LLM will faithfully wire the code as you want, and that you can write tests to sanity check behavior and verify that. That's a valid argument, if you're ok tolerating a certain level of uncertainty about behavior that you haven't meticulously checked or tested. The more you want to meticulously check behavior, the more effort it takes, and the more it converges to the effort involved in just writing the code normally.