top | item 46489765

(no title)

startupsfail | 1 month ago

There are still blatant failure modes, when models engage into clear sycophancy, rather than expressing enthusiasm, etc.

I'd guess, in practice a benchmark (like this vibesbench), that could help catching unhelpful and blatant sycophancy fails may help.

discuss

order

No comments yet.