top | item 42511217

(no title)

ricktdotorg | 1 year ago

sounds (at least at a high level) similar to EXO[1]

[1] https://github.com/exo-explore/exo

discuss

order

morphle|1 year ago

Here a video of testing Exo to run huge LLMs on a cluster of M4 Macs[1] more cheaply than with a cluster of NVDIA RTX 4090s.

[1] https://www.youtube.com/watch?v=GBR6pHZ68Ho

menaerus|1 year ago

They show a test-run of a 1B llama-3.2 model. Doesn't that fit in a single mac? Distributing the workload in this case must be slower than running it on a single machine.

However, this is interesting and I'm confused why aren't they showcasing the test-run of a larger model that actually necessitates distributing the workload across the cluster.