top | item 47175428

(no title)

ghm2199 | 2 days ago

Ist why I never give it such vague prompts. But it's sad it does not ask the user more. Also interesting and important to know how one would tease out good and correct information from llms in 2026. It's like relearning now to Google like it was 2006 all over again, except now it's much less deterministic.

I wonder how the tail of the distribution of types of requests fares e.g. engineer asking for hypothesis generation for,say, non trivial bugs with complete visibility into the system. A way to poke holes in hypothesis of one LLM is to use a "reverse prompt". You ask it to build you a prompt to feed to another LLM. Didn't used to work quite as well till mid 2025 as it does now.

I always take a research and plan prompt output from opus 4.6 especially if it looks iffy I feed it to codex/chatgpt and ask it to poke holes. It almost always does. The I ask Claude Code: Hey what do you think about the holes? I don't add an thing else in the prompt.

In my experience Claude Opus is less opinionated than ChatGPT or codex. The latter 2 always stick to their guns and in this binary battle they are generally more often correct about hypothesis.

The other day I was running Docker app container from inside a docker devbox container with host's socket for both. Bind mounts pointing to devbox would not write to it because the name space was resolving for underlying host.

Claude was sure it was a bug based to do with Zfs overlays, chatgpt was saying not so, that its just a misconfigurarion, I should use named volumes with full host paths. It was right. This is also how I discovered that using SQLite with litestream will get one really far rather than a full postgres AWS stack in many cases.

This is how you get the correct information out of LLMS in 2026.

discuss

order

gck1|1 day ago

I do this too, but the issue I have with this approach is that it's a never ending cycle. Codex/GPT will always find holes and claude will always agree they are holes. If you teach it YAGNI, then it will always disagree even on genuine holes.

If your original plan was to add a column in your db, after several cycles, your plan will be 10,000 lines long and it will contain a recipe on how to build a universe.

ghm2199|9 hours ago

The "trick(s) here are to limit the scope by always reading the plan very carefully. Here is how I do it to tackle this problem:

1. You should recognize when said holes are not "needed" holes e.g. you could make do with in memory task scheduler without rolling out more complex ones.

2. You can break up the plan— longer plans have more holes and are unwieldy mentally to go 20 rounds with in a chat coding UI.

3. Give it Learning Tests: i.e. code to run against black boxes. It's just like how we write a unit test to understand how a system works

mgfist|2 days ago

> But it's sad it does not ask the user more.

You can ask it to ask you about your task and it will ask you tons of questions.

denimnerd42|2 days ago

creating plans in claude and asking chatgpt via api to review loop was my strategy this week. I'm not a big fan of codex as a coding harness because it seems to just give up quite easily where claude will search the problem space and try things but I think gpt does a much better job of poking holes and asking clarifying questions when prompted.

killingtime74|2 days ago

I use a skill that addresses these short comings, it basically forces it to plan multiple times until the plan is very detailed. It also asks more questions

raw_anon_1111|2 days ago

I use Codex CLI in my daily usage since just with my $20/month subscription to ChatGPT, I never gets close to the quota. But it trips up over itself every now and then. At that point I just use Claude in another terminal session. We only have a laughable $750 a month corporate allowance with Claude.