top | item 44832939

(no title)

On multiple occasions, Claude Code claims it completed a task when it actually just wrote mock code. It will also answer questions with certainity (for e.g. where is this value being passed), but in reality it is making it up. So if you haven't been seeing hallucinations on Opus/Sonnet, you probably aren't looking deep enough.

discuss

theshrike79|6 months ago

This is because you haven't given it a tool to verify the task is done.

TDD works pretty well, have it write even the most basic test (or go full artisanal and write it yourself) first and then ask it to implement the code.

I have a standing order in my main CLAUDE.md to "always run `task build` before claiming a task is done". All my projects use Task[0] with pretty standard structure where build always runs lint + test before building the project.

With a semi-robust test suite I can be pretty sure nothing major broke if `task build` completes without errors.

[0] https://taskfile.dev

rohansood15|6 months ago

What do you think it is 'mocking'? It is exactly the behavior that would make the tests work. And unless I give it access to production, it has no way to verify tasks like how values (in this case secrets/envs) are being passed.

Plus, this is all besides the point. Simon argued that the model hallucinates less, not a specific product.