top | item 39841646

(no title)

Looks like someone has got DBRX running on an M2 Ultra already: https://x.com/awnihannun/status/1773024954667184196?s=20

discuss

resource_waste|1 year ago

I find 500 tokens considered 'running' a stretch.

Cool to play with for a few tests, but I can't imagine using it for anything.

irusensei|1 year ago

I can run a certain 120b on my M3 max with 128GB memory. However I found that while it “fits” Q5 was extremely slow. The story was different with Q4 though which ran just fine around ~3.5-4 t/s.

Now this model is ~134B right? It could be bog slow but on the other hand its a MoE so there might be a chance it could have satisfactory results.

marci|1 year ago

From the article, should have the speed of a ~36b.

Mandelmus|1 year ago

And it appears to be at ~80 GB of RAM via quantisation.

smcleod|1 year ago

So that would be runnable on a MBP with a M2 Max, but the context window must be quite small, I don’t really find anything under about 4096 that useful

dheera|1 year ago

That's a tricky number. Does it run on an 80GB GPU, does it auto-shave some parameters to fit in 79.99GB like any articifially "intelligent" piece of code would do, or does it give up like an unintelligent piece of code?

madiator|1 year ago

That's great, but it did not really write the program that the human asked it to do. :)

SparkyMcUnicorn|1 year ago

That's because it's the base model, not the instruct tuned one.