I can run a certain 120b on my M3 max with 128GB memory. However I found that while it “fits” Q5 was extremely slow. The story was different with Q4 though which ran just fine around ~3.5-4 t/s.
Now this model is ~134B right? It could be bog slow but on the other hand its a MoE so there might be a chance it could have satisfactory results.
So that would be runnable on a MBP with a M2 Max, but the context window must be quite small, I don’t really find anything under about 4096 that useful
That's a tricky number. Does it run on an 80GB GPU, does it auto-shave some parameters to fit in 79.99GB like any articifially "intelligent" piece of code would do, or does it give up like an unintelligent piece of code?
resource_waste|1 year ago
Cool to play with for a few tests, but I can't imagine using it for anything.
irusensei|1 year ago
Now this model is ~134B right? It could be bog slow but on the other hand its a MoE so there might be a chance it could have satisfactory results.
marci|1 year ago
Mandelmus|1 year ago
smcleod|1 year ago
dheera|1 year ago
madiator|1 year ago
SparkyMcUnicorn|1 year ago