top | item 35240717

(no title)

pka | 2 years ago

I'm still reading it, but something caught my eye:

> I interpret there to typically be hand waving on all sides of this issue; people concerned about AI risks from limited models rarely give specific failure cases, and people saying that models need to be more powerful to be dangerous rarely specify any conservative bound on that requirement.

I think these are two sides of the same coin - on one hand, AI safety researchers can very well give very specific failure cases of alignment that don't have any known solutions so far, and take this issue seriously (and have been for years while trying to raise awareness). On the other, finding and specifying that "conservative bound" precisely and in a foolproof way is exactly the holy grail of safety research.

discuss

mitthrowaway2|2 years ago

I think the holy grail of safety research is widely understood to be a recipe for creating a friendly AGI (or, perhaps, a proof that dangerous AGI cannot be made, but that seems even more unlikely). Asking for a conservative lower bound is more like "at least prove that this LLM, which has finite memory and can only answer queries, is not capable of devising and executing a plan to kill all humans", and that turns out to be more difficult than you'd think even though it's not an AGI.