(no title)
lmeyerov | 13 days ago
Chaos Congress talk on this from a couple months ago, jump to the coding loops part: https://media.ccc.de/v/39c3-breaking-bots-cheating-at-blue-t... . The talk focuses mostly on MCPs, but we now use the same flow for Skills.
This kind of experience makes me more hesitant to take on plugin and skill repos lacking evals or equivalent proving measurable quality over what the LLM knows and harness can handle. Generally a small number of things end up mattering majorly, but they end up being pivotal to get right, and the rest is a death by a thousand cuts.
No comments yet.