It is shown running on 2 or 4 raspberry pis; the point is that you can add more (ordinary, non GPU) hardware for faster inference. It's a distributed system. The sky is the limit.
It doesn't even scale linearly to 4 nodes. It's slower than a five year old gaming computer. There is definitely a hard limit on performance to be had from this approach.
ah, Thanks! but what can a distributed system like this do? is this a fun to do, for the sake of doing it project or does it have practical applications? just curious about applicability thats all.
In a distributed system, the overall performance and scalability are often constrained by the slowest component.
This Distributed Llama is over Ethernet..
The raspberry pis aren't really the point, since the raspberry pi os is basically debian, this means you could do the same thing on four much more powerful but still very cheap ($250-300 a piece) x86-64 systems running debian (with 32, 64 or 128GB RAM each if you needed). Also opening up the possibility of relatively cheap pci-express 3.0 based 10 Gbps NICs and switch between them, which isn't possible with raspberry pi.
cwoolfe|1 year ago
semi-extrinsic|1 year ago
8thcross|1 year ago
gatienboquet|1 year ago
walrus01|1 year ago