top | item 42625774

(no title)

rudiksz | 1 year ago

> "seasoned programmers are using LLMs better".

I do not remember a single instance when code provided to me by an LLM worked at all. Even if I ask something small that cand be done in 4-5 lines of code is always broken.

From a fellow "seasoned" programmer to another: how the hell do you write the prompts to get back correct working code?

discuss

HappMacDonald|1 year ago

I'd ask things like "which LLM are you using", and "what language or APIs are you asking it to write for".

For the standard answers of "GPT-4 or above", "claude sonnet or haiku", or models of similar power and well known languages like Python, Javascript, Java, or C and assuming no particularly niche or unheard of APIs or project contexts the failure rate of 4-5 line of code scripts in my experience is less than 1%.

rudiksz|1 year ago

It's mostly Go, some Python, and I'm not asking anything niche. I'm asking for basic utility functions that I could implement in 10-20 lines of code. There's something broken every single time and I spend more time debugging the generated code than actually writing it out.

I'm pretty sure everybody measures "failure rate" differently and grossly exaggerate the success rate. There's a lot of suggestions below about "tweaking", but if I have to "tweak" generated code in any way then that is a failure for me. So the failure rate of generated code is about 99%.

mordymoop|1 year ago

I write the prompt as if I’m writing an email to a subordinate that clearly specifies what the code needs to do.

If what I’m requesting an improvement to an existing code, I paste the whole code if practical, or if not, as much of the code as possible, as context before making request for additional functionality.

Often these days I add something like “preserve all currently existing functionality.” Weirdly, as the models have gotten smarter, they have also gotten more prone to delete stuff they view as unnecessary to the task at hand.

If what I’m doing is complex (a subjective judgement) I ask it to lay out a plan for the intended code before starting, giving me a chance to give it a thumbs up or clarify its understanding of what I’m asking for if it’s plan is off base.

throwaway4aday|1 year ago

Step 1: https://claude.ai

Step 2: Write out your description of the thing you want to the best of your ability but phrase it as "I would like X, could you please help me better define X by asking me a series of clarifying questions and probing areas of uncertainty."

Step 3: Once both Claude and you are satisfied that X is defined, say "Please go ahead and implement X."

Step 4a: If feature Y is incorrect, go to Step 2 and repeat the process for Y

Step 4b: If there is a bug, describe what happened and ask Claude to fix it.

That's the basics of it, should work most of the time.

antirez|1 year ago

Check my YouTube channel if you have a few minutes. I just published a video about adding a complex feature (UTF-8) to the Kilo editor, using Claude.

numpad0|1 year ago

dc: not a seasoned dev, with <b> and <h1> tags on "not".

They can't think for you. All intelligent thinking you have to do.

First, give them high level requirement that can be clarified into indented bullet points that looks like code. Or give them such list directly. Don't give them half-open questions usually favored by talented and autonomous individuals.

Then let them further decompress that pseudocode bullet points into code. They'll give you back code that resemble a digitized paper test answer. Fix obvious errors and you get a B grade compiling code.

They can't do non-conventional structures, Quake style performance optimized codes, realtime robotics, cooperative multithreading, etc., just good old it takes what it takes GUI app API and data manipulation codes.

For those use cases with these points in mind, it's a lot faster to let LLM generate tokens than typing `int this_mandatory_function_does_obvious (obvious *obvious){ ...` manually on a keyboard. That should arguably be a productivity boost in the sense that the user of LLM is effectively typing faster.

jkaptur|1 year ago

The story from the article matches my experience. The LLM's first answer is often a little broken, so I tweak it until it's actually correct.

wvenable|1 year ago

I rarely get back not working code but I've also internalized it's limitations so I no longer ask it for things it's not going to be able to do.

As other commenters have pointed it, there also a lot of variation between different models and some are quite dumb.

I've had no issues with 10-20 line coding problems. I've also had it built a lot of complete shell scripts and had no problem there either.