top | item 40790649 (no title) dmbaggett | 1 year ago For inference you could use a maxed-out Mac Ultra; the RAM is shared between the CPU and GPU. discuss order hn newest alecco|1 year ago For single user (batch_size = 1), sure. But that is quite expensive in $/tok.
alecco|1 year ago