top | item 46907246

(no title)

I do! I have an M3 Ultra with 512GB. A couple of opencode sessions running work well. Currently running GML 4.7 but was on Kimi K2.5. Both great. Excited for more efficiencies to make their way to LLMs in general.

discuss

circularfoyers|24 days ago

The prompt processing times I've heard about have put me off wanting to go that high with memory on the M series (hoping that changes for the M5 series though). What's the average and longest times you've had to wait when using opencode? Has any improvements to mlx helped in that regard?

jtbaker|23 days ago

The M5 ultra series is supposed to have some big gains around prompt processing - something like 3-4x from what I've read. I'm tempted to swap out my m4 mini that I'm using for this kind of stuff right now!

pcf|23 days ago

Wow, Kimi K2.5 runs on a single M3 Ultra with 512 GB RAM?

Can you share more info about quants or whatever is relevant? That's super interesting, since it's such a capable model.

satvikpendem|24 days ago

How's the inference speed? What was the price? I'm guessing you can fit the entire model without quantization?

UmYeahNo|24 days ago

Excellent. Thanks for the info!