top | item 47157242

(no title)

If it was easy to write evals, I would come at it from that direction.

But since it's not, what I do to avoid working on AGENTS.md blind is I test it on whatever causes me to write it.

I have some prompt, the AI messes it up in some way that I think it shouldn't, maybe it's something I've seen it do before and I'm sick of it. So I update AGENTS.md, revert the changes, /undo in the chat context and re-submit the same prompt.

discuss

sjmaplesec|3 days ago

Tessl can generate the evals, both to test anthropic best practices as well as running scenarios with and without the skill to check if it's helping