top | item 45576633

(no title)

newman314 | 4 months ago

Agreed. I also wonder why they chose to test against a Mac Studio with only 64GB instead of 128GB.

discuss

yvbbrjdr|4 months ago

Hi, author here. I crowd-sourced the devices for benchmarking from my friends. It just happened that one of my friend has this device.

ggerganov|4 months ago

FYI you should have used llama.cpp to do the benchmarks. It performs almost 20x faster than ollama for the gpt-oss-120b model. Here are some samples results on my spark:

  ggml_cuda_init: found 1 CUDA devices:
    Device 0: NVIDIA GB10, compute capability 12.1, VMM: yes
  | model                          |       size |     params | backend    | ngl | n_ubatch | fa |            test |                  t/s |
  | ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | -: | --------------: | -------------------: |
  | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |          pp4096 |       3564.31 ± 9.91 |
  | gpt-oss 20B MXFP4 MoE          |  11.27 GiB |    20.91 B | CUDA       |  99 |     2048 |  1 |            tg32 |         53.93 ± 1.71 |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |          pp4096 |      1792.32 ± 34.74 |
  | gpt-oss 120B MXFP4 MoE         |  59.02 GiB |   116.83 B | CUDA       |  99 |     2048 |  1 |            tg32 |         38.54 ± 3.10 |