top | item 45599157

(no title)

I can't tell if anthropic is serious about "model welfare" or if it's just a marketing ploy. I mean isn't it responding negatively because it has been trained that way? If they were serious, wouldn't the ethical thing be to train the model to respond neutrally to "harmful" queries?

discuss

diegoperini|4 months ago

"Protection against malicious use" isn't as cool as "model welfare". I'm renaming my authentication function to "examineCrest()".