top | item 35613796

(no title)

romland | 2 years ago

It was far easier to get a big chunks of work done in the beginning, but that is pretty much how it works for a human too (at least for me). The thing that limit you is the context-length limit of the LLM, so you have to be rather picky on what existing code you feed back in. With this then comes the issue with all the glue between the prompts, so I can see that the more polished things will need to become, the more human intervention -- this is a trend I already very much see.

If there is time saved, it is mostly because I don't fear some upcoming grunt work. Say, for instance, creating the "Builder" lemming. You know pretty much exactly how to do it but you know there will be a lot of one-off errors and subtle issues. It's easier to go at it by throwing together some prompt a bit half-heartedly and see where it goes.

On some prompts, several hours were spent, mostly reading and debugging outputs from the LLM. This is where it eventually gets a bit dubious -- I now know pretty much exactly how I want the code to look since I have seen so many variants. I might find myself massaging the prompt to narrow in on my exact solution instead of making the LLM "understand the problem".

Much of this is due to the contrived situation (human should write little code) -- in the real world you would just fix the code instead of the prompt and save a lot of time.

Thank you, by the way! I always find it scary to share links to projects! :-)

discuss

order

sk0g|2 years ago

No worries, going to check out some of the commits when I get a bit more free time as well. The concept is intriguing!

The usefulness of LLMs for engineering things is very hard to gauge, and your project is going to be quite interesting as you progress. No doubt they help with writing new things, but I spend maybe ~15% of my time working on something new, vs maintenance and extensions. The more common activities are very infrequently demonstrated, either the usefulness diminishes as the context required grows, or they simply make for less exciting examples. Though someone in my org has brought up an LLM tool that tries to remedy bugs on the fly (at runtime), which sounds absolutely horrific to me...

It sounds similar to my experience with Copilot then. In small, self-contained bits of code -- much more common in new projects or microservices for example -- it can save a lot of cookie cutter work. Sometimes it will get me 80% of the way there, and I have to manually tweak it. Quite often it produces complete garbage that I ignore. All that to say, if I wasn't an SE, Copilot brings me no closer to tackling anything beyond hello world.

One big benefit though is with the simpler test cases. If I start them with a "GIVEN ... WHEN ... THEN ..." comment, the autocompletes for those can be terrific, requiring maybe some alterations to suite my taste. I get positive feedback in PRs and from people debugging the test cases too, because the intention behind them is clear without needing to guess the rationale for the test. Win win!