top | item 45716293

(no title)

RyeCatcher | 4 months ago

Author here. I've updated the article based on your feedback. Thank you.

Key corrections:

Ollama GPU usage - I was wrong. It IS using GPU (verified 96% utilization). My "CPU-optimized backend" claim was incorrect.

FP16 vs BF16 - enum caught the critical gap: I trained with BF16, tested inference with FP16 (broken), but never tested BF16 inference. "GPU inference fundamentally broken" was overclaimed. Should be "FP16 has issues, BF16 untested (likely works)."

llama.cpp - veber-alex's official benchmark link proves it works. My issues were likely version-specific, not representative.

ARM64+CUDA maturity - bradfa was right about Jetson history. ARM64+CUDA is mature. The new combination is Blackwell+ARM64, not ARM64+CUDA itself.

The HN community caught my incomplete testing, overclaimed conclusions, and factual errors.

Ship early, iterate publicly, accept criticism gracefully.

Thanks especially to enum, veber-alex, bradfa, furyofantares, stuckinhell, jasonjmcghee, eadwu, and renaudr. The article is significantly better now.

discuss

Tiberium|4 months ago

Is there a reason why you used an LLM for the entire article, and moreover, even for this comment? Couldn't you have at least written this comment yourself?

CamperBob2|4 months ago

To be charitable, I'm assuming that their English skills aren't good. If LLMs allow us to hear from potentially billions of people who may have something worthwhile to say but who fall into that category, I wouldn't want to discourage their use in articles like this one.

But if that's not the case, then yeah, it's a crappy practice and I'd hate to see it spread any further than it already has.

colechristensen|4 months ago

This looks like better peer review than most of what gets done for scientific papers.

anticensor|4 months ago

This is what I would call the value of post-publication review. Pre-publication review is not enough.

justinclift|4 months ago

> Ollama 0.3.9 for inference

Is that version correct?

Asking because (in Ollama terms) it's positively ancient. 0.12.6 being the most recent release (currently).

I'm guessing it _might_ make a difference, as the Ollama crowd do seem to be changing things, adding new features and optimisations (etc) quite often.

For example, that 0.12.6 version is where initial experimental support for Vulkan (ie Intel Xe gpus) was added, and in my testing that worked. Not that Vulkan support would do anything in your case. ;)

sgillen|4 months ago

Late to the party here, but you should definitely be using pytorch 25.09 (or whatever is latest when you go to check) rather than 24.10. That's a year old pytorch on new hardware, I suspect a lot of these bugs have been fixed.

loufe|4 months ago

Yeah, kudos, OP. It's a very different read before-after.