top | item 45714635

(no title)

I'm utterly shocked at the article saying GPU inference (PyTorch/Transformers)isn't working. Numerical instability produces bad outputs, Not viable for real-time serving, Wait for driver/CUDA updates!

My job just got me and our entire team a DGX spark. I'm impressed at the ease of use for ollama models I couldn't run on my laptop. gpt-oss:120b is shockingly better than what I thought it would be from running the 20b model on my laptop.

The DGX has changed my mind about the future being small specialized models.

discuss

unknown|4 months ago

[deleted]

RyeCatcher|4 months ago

Totally agree. I’ve been training nanochat models all morning. Hit some speed bumps. I’ll share more later in another article. Buts it’s absolutely amazing. I fine tuned a Gemma3 model in a day yesterday.

unknown|4 months ago

[deleted]

jasonjmcghee|4 months ago

> I'm utterly shocked at the article saying GPU inference (PyTorch/Transformers)isn't working

Are you shocked because that isn't your experience?

From the article it sounds like ollama runs cpu inference not GPU inference. Is that the case for you?