top | item 44634837

(no title)

dbuxton | 7 months ago

Fundamentally, we are at a point in time where models are already very capable, but not very reliable.

This is very interesting finding about how to improve capability.

I don't see reliability expressly addressed here, but my assumption is that these alloys will be less rather than more reliable - stronger, but more brittle, to extend the alloy metaphor.

Unfortunately for many if not most B2B use cases this reliability is the primary constraint! Would love to see similar ideas in the reliability space.

discuss

order

vlovich123|7 months ago

How are you defining reliability here?

dbuxton|7 months ago

Great question. For me reliability is variance in performance and capability is average performance.

In practice high variance translates on the downside into failure to do basic things that a minimally competent human would basically never get wrong. In agents it's exacerbated by the compounding impact of repeated calls but even for basic workflows it can be annoying.