top | item 46130701

(no title)

re5i5tor | 2 months ago

For anyone using Qwen3-VL: where are you running it? I had tons of reliability problems with Qwen3-VL inference providers on OpenRouter — based on uptime graphs I wasn’t alone. But when it worked, Qwen3-VL was pack-leading good at AI Vision stuff.

discuss

lreeves|2 months ago

I run the larger version of it on a Threadripper with 512GB RAM and a 32GB GPU for the non-expert layers and context, using llama.cpp. Performs great, however god forbid you try to get that much memory these days.

sosodev|2 months ago

I’ve noticed that the open weight models have a lot of issues on OpenRouter. You get a lot of inconsistency in quality due to varying quants at least. I’ve had some seriously nonsensical responses from models that I can’t replicate at all when I switch providers. Lots that just randomly fail to handle requests too. I would recommend finding a provider that works best for your needs and pinning it.

btian|2 months ago

My company's GPU cluster

m00dy|2 months ago

I run it on ollama

nicman23|2 months ago

the big boy model?