top | item 47034610

(no title)

tarruda | 14 days ago

Would love to see a Qwen 3.5 release in the range of 80-110B which would be perfect for 128GB devices. While Qwen3-Next is 80b, it unfortunately doesn't have a vision encoder.

discuss

order

Tepix|13 days ago

Have you thought about getting a second 128GB device? Open weights models are rapidly increasing in size, unfortunately.

tarruda|13 days ago

Considered getting a 512G mac studio, but I don't like Apple devices due to the closed software stack. I would never have gotten this Mac Studio if Strix Halo existed mid 2024.

For now I will just wait for AMD or Intel to release a x86 platform with 256G of unified memory, which would allow me to run larger models and stick to Linux as the inference platform.

PlatoIsADisease|14 days ago

Why 128GB?

At 80B, you could do 2 A6000s.

What device is 128gb?

the_pwner224|14 days ago

AMD Strix Halo / Ryzen AI Max+ (in the Asus Flow Z13 13 inch "gaming" tablet as well as the Framework Desktop) has 128 GB of shared APU memory.

tgtweak|13 days ago

Spark DGX and any A10 devices, strix halo with max memory config, several mac mini/mac studio configs, HP ZBook Ultra G1a, most servers

If you're targeting end user devices then a more reasonable target is 20GB VRAM since there are quite a lot of gpu/ram/APU combinations in that range. (orders of magnitude more than 128GB).

lm28469|13 days ago

That's the maximum you can get for $3k-$4k with ryzen max+ 395 and apple studio Ms. They're cheaper than dedicated GPUs by far.

tarruda|13 days ago

Mac Studios or Strix Halo. GPT-OSS 120b, Qwen3-Next, Step 3.5-Flash all work great on a M1 Ultra.

sowbug|13 days ago

All the GB10-based devices -- DGX Spark, Dell Pro Max, etc.

vladovskiy|14 days ago

Guess, it is mac m series

bytesandbits|12 days ago

maybe a deepseek v4 distill. give it a few days