top | item 44009778

(no title)

ondra | 9 months ago

Is this any different from using --cache-type-k and --cache-type-v?

discuss

order

Aurornis|9 months ago

No, it appears to be an LLM-generated attempt to gain GitHub stars.

See my other comment for a sampling of the other oddities in the repo.

landl0rd|9 months ago

I'm guessing it's a bit different since MLX/MPS doesn't have native 4-bit support (or even 8 if I remember correctly?) It didn't launch with bf16 support even. So I think the lowest you could go on the old type_k/v solution and apple GPUs was 16-bit f16/bf16 but not a llama.cpp internals expert so maybe wrong?

azinman2|9 months ago

That’s what I want to know!