top | item 45651919

(no title)

visiondude | 4 months ago

Oh cool, can you share concrete examples of times codex out performed Claude Code? I’m my experience both tools needs to be carefully massaged with context to fulfill complex task.

discuss

order

typpilol|4 months ago

In my experience. Claude wants to try and finish everything as quickly as possible where codex is happy to take 5x the length.

The best answer is each has its uses. Using codex to do bulk edits is dumb because it takes forever, etc etc

loveparade|4 months ago

I don't really see how examples are useful because you're not going to understand the context. My prompt may be something like "We recently added a new transcription backend api (see recent git commits), integrate it into the service worker. Before implementing, create a detailed plan, ask clarifying questions, and ask for approval before writing code"

Does that help you? I doubt it. But there you go.

hluska|4 months ago

Nobody has to give you examples. People can express opinions. If you disagree, that’s fine but requesting entire prompt and response sets is quite demanding. Who are you to be that demanding?

dns_snek|4 months ago

> Who are you to be that demanding?

Let's call it the skeptical public? We've been listening to a group of people rave about how revolutionary these tools are, how they're able to perform senior level developer work, how good their code is, and how they're able to work autonomously through the use of sub-agents (i.e. vibe coding), without ever providing evidence that would support any of those grandiose claims.

But then I use these tools myself[1] and I speak to real developers who have used them and our evaluation centers around lukewarm, e.g. good at straightforward, junior level tasks, or good for prototyping, or good for initially generating tests, or good for answering certain types of questions, or good for one-off scripts, but approximately none of them would trust these LLMs to implement a more complex feature like a mid-level or senior developer would without very extensive guidance and hand-holding that takes longer than just doing it ourselves.

Given the overwhelming absence of evidence, the most charitable conclusion I can come to is that the vast majority of people making these claims have simply gone from being 0.2X developers to being 0.3X developers who happen to generate 5X more code per unit of time.

[1] e.g. my reply to https://news.ycombinator.com/item?id=45651948