top | item 45532033

(no title)

cuttothechase | 4 months ago

The fact that we now have to write cook book about cook books kind of masks the reality that there is something that could be genuinely wrong about this entire paradigm.

Why are even experts unsure about whats the right way to do something or even if its possible to do something at all, for anything non-trivial? Why so much hesitancy, if this is the panacea? If we are so sure then why not use the AI itself to come up with a proven paradigm?

discuss

order

nkmnz|4 months ago

Radioactivity was discovered before nuclear engineering existed. We had phenomena first and only later the math, tooling, and guardrails. LLMs are in that phase. They are powerful stochastic compressors with weak theory. No stable abstractions yet. Objectives shift, data drifts, evals leak, and context windows make behavior path dependent. That is why experts hedge.

“Cookbooks about cookbooks” are what a field does while it searches for invariants. Until we get reliable primitives and specs, we trade in patterns and anti-patterns. Asking the AI to “prove the paradigm” assumes it can generate guarantees it does not possess. It can explore the design space and surface candidates. It cannot grant correctness without an external oracle.

So treat vibe-engineering like heuristic optimization. Tight loops. Narrow scopes. Strong evals. Log everything. When we find the invariants, the cookbooks shrink and the compilers arrive.

sarchertech|4 months ago

We’re in the alchemist phase. If I’m being charitable, the medieval stone mason phase.

One thing worth pointing out is that the pre-engineering building large structures phase lasted a long time, and building collapses killed a lot of people while we tried to work out the theory.

Also it wasn’t really the stone masons who worked out the theory, and many of them were resistant to it.

johnh-hn|4 months ago

It reminds me of a quote from Designing Data-Intensive Applications by Martin Kleppmann. It goes something like, "For distributed systems, we're trying to create a reliable system out of a set of unreliable components." In a similar fashion, we're trying to get reliable results from an unreliable process (i.e. prompting LLMs to do what we ask).

The difficulties of working with distributed systems are well known but it took a lot of research to get there. The uncertain part is whether research will help overcome the issues of using LLMs, or whether we're really just gambling (in the literal sense) at scale.

torginus|4 months ago

LLMs are literal gambling - you get them to work right once and they are magical - then you end up chasing that high by tweaking the model and instructions the rest of the time.

vidarh|4 months ago

Or you put them to work with strong test suites and get stuff done. I am in bed. I have Claude fixing complex compiler bugs right now. It has "earned" that privilege by proving it can make good enough fixes, systematically removing actual, real bugs in reasonable ways by being given an immutable test suite and detailed instructions of the approach to follow.

There's no gambling involved. The results need to be checked, but the test suite is good enough it is hard for it to get away with something too stupid, and it's already demonstrated it knows x86 assembly much better than me.

handfuloflight|4 months ago

I actually found in my case that is just self inertia in not wanting to break through cognitive plateaus. The AI helped you with a breakthrough hence the magic, but you also did something right in your constructing of the context in the conversation with the AI; ie. you did thought and biomechanical[1] work. Now the dazzle of the AI's output makes you forget the work you still need to do, and the next time you prompt you get lazy, or you want much more, for much less.

[1] (moving your eyes, hands, hearing with your ears. etc)

sarchertech|4 months ago

LLMs are cargo cult generating machines. I’m not denying they can be useful for some tasks, but the amount of superstitions caused by these chaotic, random, black boxes is unreal.

scuff3d|4 months ago

The whole damn industry is deep in sunk cost fallacy. There is no use case and no sign of a use case that justifies the absolutely unbelievable expenditure that has been made on this technology. Everyone is desperate to find something, but they're just slapping more guardrails on hoping everything doesn't fall apart.

And just for clarity, I'm not saying they aren't useful at all. I'm saying modest productivity improvement aren't worth the absolutely insane resources that have been poured into this.

hx8|4 months ago

I share the same skepticism, but I have more patience to watch an emerging technology advance and forgiving as experts come to a consensus while communicating openly.

galaxyLogic|4 months ago

> why not use the AI itself to come up with a proven paradigm?

Because AI can only imitate the language it has seen. If there are no texts in its training materials about what is the best way to use multiple coding agents at the same time, then AI knows very little about that subject matter.

AI only knows what humans know, but it knows much more than any single human.

We don't know "what is the best way to use multiple coding agents" until we or somebody else does some experiments and records the findings. Buit AI is not there yet to be able to do such actual experiments itself.

panarky|4 months ago

I'm sorry, but the whole stochastic parrot thing is so thoroughly debunked at this point that we should stop repeating it as if it's some kind of rare wisdom.

AlphaGo showed that even pre-LLM models could generate brand new approaches to winning a game that human experts had never seen before, and didn't exist in any training material.

With a little thought and experimentation, it's pretty easy to show that LLMs can reason about concepts that do not exist in its training corpus.

You could invent a tiny DSL with brand-new, never-seen-before tokens, give two worked examples, then ask it to evaluate a gnarlier expression. If it solves it, it inferred and executed rules you just made up for the first time.

Or you could drop in docs for a new, never-seen-before API and ask it to decide when and why to call which tool, run the calls, and revise after errors. If it composes a working plan and improves from feedback, that’s reasoning about procedures that weren’t in the corpus.