Spent the last year building a full production system solo. Not a side project. Hundreds of thousands of lines, dozens of services, CI/CD, Terraform, multi-locale UI, thousands of tests. Real constraints, real consequences.
At small scale the AI was genuinely useful. At medium scale it started making subtle mistakes. At large scale it behaved exactly as it was built: a next-token predictor with no memory of the system it was modifying. It fixed bugs while breaking invariants elsewhere. It refactored modules while forgetting why they existed. It regenerated logic that was already there.
I had strict rules, enforced invariants, a TDD roadmap. Didn't matter. The substrate has no concept of continuity.
This isn't a prompting problem. The architecture is missing a layer that every other platform had before it could scale. Operating systems have it. Compilers have it. Databases have it. AI tooling doesn't.
Devin got 13.86% on SWE-bench. LangChain and AutoGPT collapsed under real workloads. Not because the models are bad. Because the substrate can't hold a system together.
Wrote up what that missing layer is and why nobody has built it yet. If you build real systems with AI, curious whether this matches what you're seeing.
iggori|6 days ago