top | item 43940982

(no title)

my m4max macbook can run local inference on a medium-ish gemini model (32b IIRC). The power consumption spikes by about 120 watts over idle (with multiple electron apps, docker, etc). It runs about 70 tokens/sec and usually responds within 10 to 20 seconds.

So.. picking some numbers for calculation. 4 answers per minute @ 120 watts is about .5 watt-hours per answer. ~200 responses would be enough to drain the (normally quite long lasting battery).

How does that compare to the more common nvidia GPUs? I don't know.

discuss

No comments yet.