top | item 46854792

(no title)

djeastm | 27 days ago

>I've seen Claude hallucinate running test suites before.

This reminded of something that happened to me last year. Not Claude (I think it was GPT 4.0 maybe?), but I had it running in VS Code's Copilot and asked it to fix a bug then add a test for the case.

Well, it kept failing to pass its own test, so on the third try, it sat there "thinking" for a moment, then finally spit out the command `echo "Test Passed!"`, executed it, read it from the terminal, and said it was done.

I was almost impressed by the gumption more than anything.

discuss

order

Merad|27 days ago

I've been using Claude Code with Opus 4.5 a lot the last several months and while it's amazingly capable it has a huge tendency to give up on tests. It will just decide that it can commit a failing test because "fixing it has been deferred" or "it's a pre-existing problem." It also knows that it can use `HUSKY=0 git commit ...` to bypass tests that are run in commit hooks. This is all with CLAUDE.md being very specific that every commit must have passing tests, lint, etc. I eventually had to add a Claude Code pre-command hook (which it can't bypass) to block it from running git commit if it isn't following the rules.

theshrike79|26 days ago

Anecdata from the internet has a few stories of Claude Opus bypassing hooks too =)

1) it wants to run X command

2) it notices a hook preventing it from running X

3) it creates a Python application or shell script that does X and runs it instead

Whoops.