top | item 46961326

(no title)

ncruces | 19 days ago

I have.

Just this month I've burned through 80% of my Copilot quota of Claude Opus 4.6 in a couple of days to get it to help me with a silly hobby project: https://github.com/ncruces/dbldbl

It did help. The project had been sitting for 3 years without trig and hyperbolic trig, and in a couple days of spare time I'm adding it. Some of it through rubber ducking chat and/or algorithmic papers review (give me formulas, I'll do it), some through agent mode (give me code).

But if you review the PR written in agent mode, the model still lies to my face, in trivial but hard to verify ways. Like adding tests that say cosh(1) is this number at that OEIS link, and both the number and the OEIS link are wrong, but obviously tests pass because it's a lie.

I'm not trying to bash the tech. I use it at work in limited but helpful ways, and use hobby stuff like this as a testbed precisely to try to figure out what they're good at in a low stakes setting.

But you trust the plausibly looking output of these things at your own peril.

discuss

No comments yet.