top | item 47041577

(no title)

secbear | 13 days ago

The finding that self-generated skills provide negative benefit (-1.3pp) while curated skills give +16.2pp is the most interesting result here imo. Big discrepancy, but makes sense. Aligns with the thought that LLMs are better consumers of procedural knowledge than producers of it.

+4.5pp for software engineering is suspiciously low compared to +51.9pp for healthcare. I suspect this reflects that frontier models already have strong SWE priors from training data, so skills add less marginal value. If true, skills become most valuable precisely in the domains where models are weakest — which is where you'd actually want to deploy agents in production. That's encouraging.

discuss

cheema33|13 days ago

> +4.5pp for software engineering is suspiciously low compared to +51.9pp for healthcare.

This stood out for me as well. I do think that LLMs have a lot of training data on software engineering topics and that perhaps explains the large discrepancy. My experience has been that if I am working with a software library or tool that is very new or not commonly used, skills really shine there. Example: Adobe React Spectrum UI library. Without skills, Opus 4.6 produces utter garbage when trying to use this library. With properly curated/created skills, it shines. Massive difference.

D-Machine|13 days ago

Nothing other to say than I appreciate you sharing these explicit details and insights here.

hardware2415|13 days ago

[deleted]

nvader|13 days ago

Hmm, not for me, but I'm curious if there are signatures I'm missing.

To me, author reads like an articulate native English speaker, but typing on their phone.

jeron|13 days ago

not all em-dash users are AI!

jibal|13 days ago

All ad hominems are irrational but that one is worse than most.

unknown|13 days ago

[deleted]