(no title)
djhworld | 5 months ago
Which got me thinking about how do these frontier AI models work when you (as a user) run a query. Does your query just go to one big box with lots of GPUs attached and it runs in a similar way, but much faster? Do these AI companies write about how their infra works?
geerlingguy|5 months ago
And yes, they basically have 1 Tbps+ interconnects and throw tens or hundreds of GPUs at queries. Nvidia was wise to invest so much in their networking side—they have massive bandwidth between machines and shared memory, so they can run massive models with tons of cards, with minimal latency.
It's still not as good as tons of GPU attached to tons of memory on _one_ machine, but it's better than 10, 25, or 40 Gbps networking that most small homelabs would run.