For anyone using Qwen3-VL: where are you running it? I had tons of reliability problems with Qwen3-VL inference providers on OpenRouter — based on uptime graphs I wasn’t alone. But when it worked, Qwen3-VL was pack-leading good at AI Vision stuff.
I run the larger version of it on a Threadripper with 512GB RAM and a 32GB GPU for the non-expert layers and context, using llama.cpp. Performs great, however god forbid you try to get that much memory these days.
I’ve noticed that the open weight models have a lot of issues on OpenRouter. You get a lot of inconsistency in quality due to varying quants at least. I’ve had some seriously nonsensical responses from models that I can’t replicate at all when I switch providers. Lots that just randomly fail to handle requests too. I would recommend finding a provider that works best for your needs and pinning it.
lreeves|2 months ago
sosodev|2 months ago
btian|2 months ago
m00dy|2 months ago
nicman23|2 months ago