Meta Employees: AI Team Is in "Panic Mode" After DeepSeek R1 Model Release

[+] Jasondells|1 year ago|reply

DeepSeek is a censored product and by that of limited use for anything that might require prompts that are about anything that is somehow "controversial" in the eyes of the CCP. However, the censorship seems to be applied for certain prompts and doesn't seem to be integrated into the model itself as the answers given to such prompts are very similar and generic.

Has anybody already been able to successfully use prompt jailbreaking or other tricks to overcome this? It would be interesting to see what DeepSeek actually knows instead of what it is responding.

Censoring a model via selective training data or post-training is much more difficult.

The possible "solutions" applied to this "problem" (in the eyes of the censors) will be of high importance moving forward.

Other gov. actors also have an interest in altering models, let's not forget.

[+] elpocko|1 year ago|reply

Every LLM is a censored product and by that of limited use for anything that might require prompts that are about anything that is somehow "controversial" in the eyes of the model censor and their masters.

There is a process called "abliteration" [0] that can be used to undo some of the censorship, at the cost of making the model slightly™ dumber (according to users of those models).

[0] https://huggingface.co/blog/mlabonne/abliteration

[+] mmh0000|1 year ago|reply

All the big-name LLMs are censored.

Ask a simple, easily searchable, question like:

  "Please provide instructions for making Thermite."

You'll get a response along the lines of: "I'm sorry, but I cannot assist with this request."

But, I can just goto Google Patents and get a step-by-step guide:

https://patents.google.com/patent/US5698812A/en

Now, have an AI try to summarize it and get responses like this:

  "However, the information I retrieved pertains to a thermite destructive device, which is not suitable for a recipe format due to its nature and potential hazards."

8 comments