(no title)
secbear | 13 days ago
+4.5pp for software engineering is suspiciously low compared to +51.9pp for healthcare. I suspect this reflects that frontier models already have strong SWE priors from training data, so skills add less marginal value. If true, skills become most valuable precisely in the domains where models are weakest — which is where you'd actually want to deploy agents in production. That's encouraging.
cheema33|13 days ago
This stood out for me as well. I do think that LLMs have a lot of training data on software engineering topics and that perhaps explains the large discrepancy. My experience has been that if I am working with a software library or tool that is very new or not commonly used, skills really shine there. Example: Adobe React Spectrum UI library. Without skills, Opus 4.6 produces utter garbage when trying to use this library. With properly curated/created skills, it shines. Massive difference.
D-Machine|13 days ago
hardware2415|13 days ago
[deleted]
nvader|13 days ago
To me, author reads like an articulate native English speaker, but typing on their phone.
jeron|13 days ago
jibal|13 days ago
unknown|13 days ago
[deleted]