top | item 46974214 (no title) vkaufmann | 19 days ago GPT-OSS-120B runs like hell on my DGX Spark discuss order hn newest embedding-shape|19 days ago The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM. vkaufmann|19 days ago yep MXFP4 really fast :D
embedding-shape|19 days ago The MXFP4 variant I suppose? My setup (RTX Pro 6000) does around ~140 tok/s with llama.cpp, around 160 tok/s with vLLM. vkaufmann|19 days ago yep MXFP4 really fast :D
embedding-shape|19 days ago
vkaufmann|19 days ago