top | item 43293254

(no title)

cataphract | 11 months ago

No it isn't. Just gave some context to it (purpose) and copied into it some 250 lines of code I know has some bugs someone looking more or less closely would find and asked to evaluate its correctness. It did not find any of the problems and reported 5 supposed problems that don't exist.

discuss

vessenes|11 months ago

4.5 is not trained on code. And it shows. It is however to my eyes more fluid, thoughtful, has better theory of mind. It's like someone scaled up GPT-4, and I really like it.

woah|11 months ago

You don't know how smart OP is

kadushka|11 months ago

I give a similar task when I interview SWE candidates: about half cannot find any bugs (and sometimes see bugs where there are none), despite years of claimed experience in the language/domain.