DeepSeek is a censored product and by that of limited use for anything that might require prompts that are about anything that is somehow "controversial" in the eyes of the CCP. However, the censorship seems to be applied for certain prompts and doesn't seem to be integrated into the model itself as the answers given to such prompts are very similar and generic.
Has anybody already been able to successfully use prompt jailbreaking or other tricks to overcome this? It would be interesting to see what DeepSeek actually knows instead of what it is responding.
Censoring a model via selective training data or post-training is much more difficult.
The possible "solutions" applied to this "problem" (in the eyes of the censors) will be of high importance moving forward.
Other gov. actors also have an interest in altering models, let's not forget.
Every LLM is a censored product and by that of limited use for anything that might require prompts that are about anything that is somehow "controversial" in the eyes of the model censor and their masters.
There is a process called "abliteration" [0] that can be used to undo some of the censorship, at the cost of making the model slightly™ dumber (according to users of those models).
Now, have an AI try to summarize it and get responses like this:
"However, the information I retrieved pertains to a thermite destructive device, which is not suitable for a recipe format due to its nature and potential hazards."
[+] [-] Jasondells|1 year ago|reply
Has anybody already been able to successfully use prompt jailbreaking or other tricks to overcome this? It would be interesting to see what DeepSeek actually knows instead of what it is responding.
Censoring a model via selective training data or post-training is much more difficult.
The possible "solutions" applied to this "problem" (in the eyes of the censors) will be of high importance moving forward.
Other gov. actors also have an interest in altering models, let's not forget.
[+] [-] elpocko|1 year ago|reply
There is a process called "abliteration" [0] that can be used to undo some of the censorship, at the cost of making the model slightly™ dumber (according to users of those models).
[0] https://huggingface.co/blog/mlabonne/abliteration
[+] [-] mmh0000|1 year ago|reply
Ask a simple, easily searchable, question like:
You'll get a response along the lines of: "I'm sorry, but I cannot assist with this request."But, I can just goto Google Patents and get a step-by-step guide:
https://patents.google.com/patent/US5698812A/en
Now, have an AI try to summarize it and get responses like this:
[+] [-] verdverm|1 year ago|reply
https://github.com/huggingface/open-r1
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] unknown|1 year ago|reply
[deleted]