top | item 43884470

(no title)

anemll | 10 months ago

What hardware are you on? Most models are memory bandwidth limited. ANE was limited to 64GB/s prior to M3 Max or M4 pro. If you are on M1, GPU will be significantly faster for 3-8B models due to memory bandwidth rather then ANE capabilities.

discuss

order

SparkyMcUnicorn|10 months ago

M4 Max with 128GB of memory.

anemll|9 months ago

M4 max should work at 120GB for ANE and 500+ for GPU. So GPU will be 3-4 times faster for anything over 1-3B. ANE is likely as fast for prefill due to higher FLOPs