top | item 45281575

(no title)

natrys | 5 months ago

You are obviously factually correct, I reproduced the same refusal - so consider this not as an attack on your claim. But a quick google search reveals that Falun Gong is an outlawed organization/movement in China.

I did a "s/Falun Gong/Hamas/" in your prompt and got the same refusal in GPT-5, GPT-OSS-120B, Claude Sonnet 4, Gemini-2.5-Pro as well as in DeepSeek V3.1. And that's completely within my expectation, probably everyone else's too considering no one is writing that article.

Goes without saying I am not drawing any parallel between the aforementioned entities, beyond that they are illegal in the jurisdiction where the model creators operate - which as an explanation for refusal is fairly straightforward. So we might need to first talk about why that explanation is adequate for everyone else but not for a company operating in China.

discuss

order

godelski|5 months ago

Thanks. Mind providing screenshots? I believe you, I just think this helps. Your comments align with some of my other responses. I'm not trying to make hard claims here and I'm willing to believe the result is not nefarious. But it's still worth investigating. In the weakest form it's worth being aware of how laws in other countries impact ours, right?

But I don't think we should talk about explanation until we can even do some verification. At this point I'm not entirely sure. We still have the security question open and I'm asking for help because I'm not a security person. Shouldn't we start here?

natrys|5 months ago

If you mean the bit about refusal from other models, then sure here is another run with same result:

https://i.postimg.cc/6tT3m5mL/screen.png

Note I am using direct API to avoid triggering separate guardrail models typically operating in front of website front-ends.

As an aside the website you used in your original comment:

> [2] Used this link https://www.deepseekv3.net/en/chat

This is not the official DeepSeek website. Probably one of the many shady third-party sites riding on DeepSeek name for SEO, who knows what they are running. In this case it doesn't matter, because I already reproduced your prompt with a US based inference provider directly hosting DeepSeek weights, but still worth noting for methodology.

(also to a sceptic screenshots shouldn't be enough since they are easily doctored nowadays, but I don't believe these refusals should be surprising in the least to anyone with passing familiarity with these LLMs)

---

Obviously sabotage is a whole another can of worm as opposed to mere refusal, something that this article glossed over without showing their prompts. So, without much to go on, it's hard for me to take this seriously. We know garbage in context can degrade performance, even simple typos can[1]. Besides LLMs at their present state of capabilities are barely intelligent enough to soundly do any serious task, it stretches my disbelief that they would be able to actually sabotage to any reasonable degree of sophistication - that said I look forward to more serious research on this matter.

[1] https://arxiv.org/abs/2411.05345v1