top | item 46904755

(no title)

ghosty141 | 24 days ago

I'm personally 100% convinced (assuming prices stay reasonable) that the Codex approach is here to stay.

Having a human in the loop eliminates all the problems that LLMs have and continously reviewing small'ish chunks of code works really well from my experience.

It saves so much time having Codex do all the plumbing so you can focus on the actual "core" part of a feature.

LLMs still (and I doubt that changes) can't think and generalize. If I tell Codex to implement 3 features he won't stop and find a general solution that unifies them unless explicitly told to. This makes it kinda pointless for the "full autonomy" approach since effecitly code quality and abstractions completely go down the drain over time. That's fine if it's just prototyping or "throwaway" scripts but for bigger codebases where longevity matters it's a dealbreaker.

discuss

order

_zoltan_|24 days ago

I'm personally 100% convinced of the opposite, that it's a waste of time to steer them. we know now that agentic loops can converge given the proper framing and self-reflectiveness tools.

sealeck|24 days ago

Converge towards what though... I think the level of testing/verification you need to have an LLM output a non-trivial feature (e.g. Paxos/anything with concurrency, business logic that isn't just "fetch value from spreadsheet, add to another number and save to the database") is pretty high.

zeroxfe|24 days ago

> it's a waste of time to steer them

It's not a waste of time, it's a responsibility. All things need steering, even humans -- there's only so much precision that can be extrapolated from prompts, and as the tasks get bigger, small deviations can turn into very large mistakes.

There's a balance to strike between micro-management and no steering at all.

bcarv|24 days ago

Does the AI agent know what your company is doing right now, what every coworker is working on, how they are doing it, and how your boss will change priorities next month without being told?

If it really knows better, then fire everyone and let the agent take charge. lol

halfcat|24 days ago

> given the proper framing

This sounds like never. Most businesses are still shuffling paper and couldn’t give you the requirements for a CRUD app if their lives depended on it.

You’re right, in theory, but it’s like saying you could predict the future if you could just model the universe in perfect detail. But it’s not possible, even in theory.

If you can fully describe what you need to the degree ambiguity is removed, you’ve already built the thing.

If you can’t fully describe the thing, like some general “make more profit” or “lower costs”, you’re in paper clip maximizer territory.

rapind|24 days ago

Maybe some day, but as a claude code user it makes enough pretty serious screw ups, even with a very clearly defined plan, that I review everything it produces.

You might be able to get away without the review step for a bit, but eventually (and not long) you will be bitten.

retinaros|23 days ago

good luck.

xXSLAYERXx|24 days ago

I've been using codex for one week and I have been the most productive I have ever been. Small prs, tight rules, I get almost exactly what I want. Things tend to go sideways when scope creeps into my request. But I just close the PR instead of fighting with the agent. In one week: 28 prs, 26 merged. Absolutely unreal.

vidarh|24 days ago

I will personally never consider using an agent that can't be easily pushed toward working on its own for long periods (hours) at a time. It's a total waste of time for me to babysit the LLM.

NuclearPM|24 days ago

> If I tell Codex to implement 3 features he won't stop and find a general solution that unifies them unless explicitly told to

That could easily be automated.

Skidaddle|24 days ago

But tokens are way cheaper than human labor

sejje|24 days ago

Aider was doing this a long time ago