(no title)
dbuxton | 7 months ago
This is very interesting finding about how to improve capability.
I don't see reliability expressly addressed here, but my assumption is that these alloys will be less rather than more reliable - stronger, but more brittle, to extend the alloy metaphor.
Unfortunately for many if not most B2B use cases this reliability is the primary constraint! Would love to see similar ideas in the reliability space.
vlovich123|7 months ago
dbuxton|7 months ago
In practice high variance translates on the downside into failure to do basic things that a minimally competent human would basically never get wrong. In agents it's exacerbated by the compounding impact of repeated calls but even for basic workflows it can be annoying.