top | item 40790649

(no title)

dmbaggett | 1 year ago

For inference you could use a maxed-out Mac Ultra; the RAM is shared between the CPU and GPU.

discuss

order

alecco|1 year ago

For single user (batch_size = 1), sure. But that is quite expensive in $/tok.