(no title)
fc417fc802 | 23 hours ago
If I'm right about that then if you're willing to go in for somewhere in the vicinity of $30k (24x the Max 385 model) you should be able to achieve ChatGPT performance.
fc417fc802 | 23 hours ago
If I'm right about that then if you're willing to go in for somewhere in the vicinity of $30k (24x the Max 385 model) you should be able to achieve ChatGPT performance.
ibeckermayer|11 hours ago
fc417fc802|10 hours ago
Is it? Why do you say that? I understand inference to be almost entirely bottlenecked on memory bandwidth.
There are n^2 weights per layer but only n state values in the vector that exists between layers. Transmitting a few thousand (or even tens of thousands) of fp values does not require a notable amount of bandwidth by modern standards.
Training is an entirely different beast of course. And depending on the workload latency can also impact performance. But for running inference with a single query from a single user I don't see how inter-node bandwidth is going to matter.