top | item 46493198

(no title)

threeducks | 1 month ago

I have tried a few Qwen-2.5 and 3.0 models (<=30B), even abliterated ones, but it seems that some words have been completely wiped from their pretraining dataset. No amount of prompting can bring back what has never been there.

For comparison, I have also tried the smaller Mistral models, which have a much more complete vocabulary, but their writing sometimes lacks continuity.

I have not tried the larger models due to lack of VRAM.

discuss

order

dizhn|1 month ago

You can give their hosted versions a go using one of the free clis. (qwen coder cli has qwen models, opencode has a different selection all the time. it was glm recently. there's also deepseek which is quite cheap)