A few comments mentioning distillation. If you use claude-code with the z.ai coding plan, I think it quickly becomes obvious they did train on other models. Even the "you're absolutely right" was there. But that's ok. The price/performance ratio is unmatched.
hashbig|2 months ago
polyrand|2 months ago
It's a pattern I saw more often with claude code, at least in terms of how frequently it says it (much improved now). But it's true that just this pattern alone is not enough to infer the training methods.
theptip|2 months ago
ljosifov|2 months ago
Havoc|2 months ago
I don't think that's particularly conclusive for training on other models. Seems plausible to me that the internet data corpus simply converges on this hence multiple models doing this.
...or not...hard to tell either way.