Would love to see a Qwen 3.5 release in the range of 80-110B which would be perfect for 128GB devices. While Qwen3-Next is 80b, it unfortunately doesn't have a vision encoder.
Considered getting a 512G mac studio, but I don't like Apple devices due to the closed software stack. I would never have gotten this Mac Studio if Strix Halo existed mid 2024.
For now I will just wait for AMD or Intel to release a x86 platform with 256G of unified memory, which would allow me to run larger models and stick to Linux as the inference platform.
Spark DGX and any A10 devices, strix halo with max memory config, several mac mini/mac studio configs, HP ZBook Ultra G1a, most servers
If you're targeting end user devices then a more reasonable target is 20GB VRAM since there are quite a lot of gpu/ram/APU combinations in that range. (orders of magnitude more than 128GB).
Tepix|13 days ago
tarruda|13 days ago
For now I will just wait for AMD or Intel to release a x86 platform with 256G of unified memory, which would allow me to run larger models and stick to Linux as the inference platform.
PlatoIsADisease|14 days ago
At 80B, you could do 2 A6000s.
What device is 128gb?
the_pwner224|14 days ago
tgtweak|13 days ago
If you're targeting end user devices then a more reasonable target is 20GB VRAM since there are quite a lot of gpu/ram/APU combinations in that range. (orders of magnitude more than 128GB).
kristianp|13 days ago
[1] https://www.techpowerup.com/gpu-specs/rtx-a6000.c3686
lm28469|13 days ago
tarruda|13 days ago
sowbug|13 days ago
vladovskiy|14 days ago
bytesandbits|12 days ago