top | item 47125192

Writing High Quality Production Code with LLMs Is a Solved Problem

9 points| menzoic | 6 days ago |escobyte.substack.com

7 comments

order

menzoic|6 days ago

I work at Airbnb where I write 99% of my production code using LLMs. Spotify's CEO recently announced something similar, but I mention my employer not because my workflow is sponsored by them (many early adopters learned similar techniques), but to establish a baseline for the massive scale, reliability constraints, and code quality standards this approach has to survive.

Many engineers abandon LLMs because they run into problems almost instantly, but these problems have solutions. If you're a skeptic, please read and let me know what you think.

The top problems are:

* Constant refactors (generated code is really bad or broken)

* Lack of context (the model doesn’t know your codebase, libraries, APIs, etc.)

* Poor instruction following (the model doesn’t implement what you asked for)

* Doom loops (the model can’t fix a bug and tries random things over and over again)

* Complexity limits (inability to modify large codebases or create complex logic)

In this article, I show how to solve each of these problems by using the LLM as a force multiplier for your own engineering decisions, rather than a random number generator for syntax.

A core part of my approach is Spec-Driven Development. I outline methods for treating the LLM like a co-worker having technical discussions about architecture and logic, and then having the model convert those decisions into a spec and working code.

carrot5Top|6 days ago

For sure, with the latest models, treating the model like a respected professional that needs context and input is essential. usually I get the best results when the context window is right around 70% full

menzoic|6 days ago

> get the best results when the context window is right around 70%

I used to be trigger happy with /compact or using the hand off technique to transfer knowledge between sessions with a doc. But lately the newer generation of models seem to be handling long context pretty well up to around 20% remaining context.

But this is when I'm working on the same focused task. I would instantly reset it if I started implementing an unrelated task. Even if there was 90% left, since theres just no benefit to keeping the old context

chalmers|6 days ago

Yep! That’s almost exactly the workflow I’ve landed on too. I could not agree more

menzoic|6 days ago

It's basically the typical SDLC boosted with LLMs. Especially the part where you can explore tradeoffs and alternative approaches rapidly.

Soupzzz|6 days ago

I read you are using Codex and lost interest in the rest of the post

menzoic|6 days ago

LOL, honestly I hated Codex when it first came out. It was backed by o3 at the time.

But literally as soon as GPT-5 came out in Codex and with the "high" option, I completely switched from Claude Codex to Codex. Never imagined that would happen so fast.