top | item 43464207

(no title)

wgd | 11 months ago

You can run 4-bit quantized version at a small (though nonzero) cost to output quality, so you would only need 16GB for that.

Also it's entirely possible to run a model that doesn't fit in available GPU memory, it will just be slower.

discuss

No comments yet.