top | item 39492339

(no title)

abra0 | 2 years ago

Well if you are not using a rented machine during a period of time, you should release it.

Agreed on reliability and data transfer, that's a good point.

Out of curiosity, what do you use a 2x3090 rig for? Bulk not time-sensitive inference on down quanted models?

discuss

imiric|2 years ago

> Well if you are not using a rented machine during a period of time, you should release it.

If you're using them for inference, your usage pattern is unpredictable. I could spend hours between having to use it, or minutes. If you shut it down and release it, the host might be gone the next time you want to use it.

> what do you use a 2x3090 rig for? Bulk not time-sensitive inference on down quanted models?

Yeah. I can run 7B models unquantized, ~13-33B at q8, and ~70B at q4, at fairly acceptable speeds (>10tk/s).

whimsicalism|2 years ago

if you are just using it for inference, i think an appropriate comparison would just be like a together.ai endpoint or something - which allows you to scale up pretty immediately and likely is more economical as well.