top | item 46309334 (no title) genpfault | 2 months ago Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via llama.cpp's Vukan backend. discuss order hn newest Tepix|2 months ago Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).
Tepix|2 months ago Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).
Tepix|2 months ago