top | item 46380294

(no title)

meltyness | 2 months ago

In security-eze I guess you'd say then that there are AI capabilities that must be kept confidential,... always? Is that enforceable? Is it the government's place?

I think current censorship capabilities can be surmounted with just the classic techniques; write a song that... x is y and y is z... express in base64, though stuff like, what gemmascope maybe can still find whole segments of activation?

It seems like a lot of energy to only make a system worse.

discuss

throwuxiytayq|2 months ago

Censoring models to avoid outputting Taylor Swift's songs has essentially nothing to do with the concept of AI alignment.

meltyness|2 months ago

I mean I'm sure cramming synthetic data and scaling models to enhance like, in-model arithmetic, memory, etc. makes "alignment" appear more complex / model behavior more non-newtonian so to speak, but it's going to boil down to censorship one way or another. Or an NSP approach where you enforce a policy over activations using another separate model, and so-on and so-on.

Is it likely that it's a bigger problem to try and apply qualitative policies to training data, activations, and outputs than the approach ML-guys think is primarily appropriate (ie., nn training) or is it a bigger problem to scale hardware and explore activation architectures that have more effective representation[0], and make a better model? If you go after the data but cascade a model in to rewrite history that's obviously going to be expensive, but easy. Going after outputs is cheap and easy but not terrifically effective... but do we leave the gears rusty? Probably we shouldn't.

It's obfuscation to assert that there's some greater policy that must be applied to models beyond the automatic modeling that happens, unless there's some specific outcome you intend to prevent, namely censorship at this point, maybe optimistically you can prevent it from lying? Such application of policies have primarily targeted solutions that reduce model efficacy and universality.

[0] https://news.ycombinator.com/item?id=35703367