We built installable software for Windows & Linux that makes any remote Nvidia GPU accessible to, and shareable across, any number of remote clients running local applications, all over standard networking.
> *Basically, we aren't targeting support for graphical applications running on Linux because there is very little demand for this - but we cover everything else. You CAN run graphical applications on Windows vs. a Linux server.
Ah, disappointing; I was hoping to try this with a Steam Deck as an alternative to using Moonlight and streaming the entire game from the Windows machine.
I'm confused, where is the actual source code? This repo only has some Dockerfiles that, as far as I can tell, are pulling precompiled opaque binaries and have some convenience scripts to set up the required runtime environment.
I assume it cost quite some $$$ to produce this because you kinda have to cut nvidias binary drivers in half, which is going to require quite a lot of reverse engineering.
Serverless GPU is all of the rage in the past month - I'd love to see a review of this from someone who knows how to benchmark a GPU workload.
In particular:
- Autoscaling Stable Diffusion Inference
- Traditional creative workflows (realtime GPU viewport in octane for example)
- Gaming from one GPU in your house to everywhere else
I get the training example for small models but can't imagine it scales that well with model size
The big value seems to be... share 1 GPU to many computers, so spend less on a cluster? Capacity fungibility is real value but hard to measure!
In any case, stuff like this is a good bet. GPU software will continue to increase in prevalence, and utilization will remain low. Solving for the compute market liquidity is important despite NVIDIA's best efforts.
We have all these running fantastically, please check out our discord where we have clips and and demonstrations of these sorts of workloads. https://discord.gg/2SWbpXx9
For anything involving inference you’re much better off with one of the many inference model servers such as TensorFlow serving, Triton Inference Server, etc.
It surprises me that this works well enough to be useful. I would have thought that network latency, being orders of magnitude higher than memory latency, would be a huge problem. Latency Numbers Everyone Should Know: https://static.googleusercontent.com/media/sre.google/en//st...
I'd be surprised if this works for anything latency sensitive over anything more than a LAN.
Even just the time it takes speed of light between NY and LA (410^6m/310^8m/s=1/75s) is roughly how long a 60 fps frame is (1/60s). Add OS serializing the frame from the GPU onto the network card, network switching of those packets, and you're starting to really feel that latency.
About 10 years ago I found set operations in ruby were slower then set operations in redis. So I shipped all my data over the network - let redis sort into a sorted set and then crunched my data in redis - retrieving it again over the network in its reduced form… I think it makes sense that for vector operations a remote gpu could be pretty cool. Now if we can get this working from MacBooks to Linux gpus I’d be pretty stoked
Didn't we have those things already? Virtual-GL and Co. say hi.
Also for most real GPU applications, you need to get the data in and out. I don't think splitting compute across a (insert any non-Infiniband-link) solves this
I see lots of comments in various ML repositores about trouble running on multiple GPUs. This seems like a great way to run across multiple low VRAM GPUs instead of buying a huge expensive single card. It feels reminiscent of how Google built their clusters on commodity hardware where they would just throw away a failed device rather than trying to fix it. This is really cool.
That's really awesome. I'm not sure what I'd use it for but just being able to makes me want to find an excuse! What's impressive is this seems to have more capabilities than most "local" software vGPU solutions for e.g. VMs.
Do you have any numbers on the viability of using this for ML/AI workloads? seems like once a model is ingested into a gpu vram theoretically the transactional new inputs / outputs would be trivial.
Can this be used to accelerate video decode in a linux machine/virtual machine? It sounds like it is not for graphics on linux but it unclear to me where decode falls.
[+] [-] fire|3 years ago|reply
Ah, disappointing; I was hoping to try this with a Steam Deck as an alternative to using Moonlight and streaming the entire game from the Windows machine.
[+] [-] lopkeny12ko|3 years ago|reply
[+] [-] pifm_guy|3 years ago|reply
I assume it cost quite some $$$ to produce this because you kinda have to cut nvidias binary drivers in half, which is going to require quite a lot of reverse engineering.
[+] [-] JayStavis|3 years ago|reply
In particular:
- Autoscaling Stable Diffusion Inference
- Traditional creative workflows (realtime GPU viewport in octane for example)
- Gaming from one GPU in your house to everywhere else
I get the training example for small models but can't imagine it scales that well with model size
The big value seems to be... share 1 GPU to many computers, so spend less on a cluster? Capacity fungibility is real value but hard to measure!
In any case, stuff like this is a good bet. GPU software will continue to increase in prevalence, and utilization will remain low. Solving for the compute market liquidity is important despite NVIDIA's best efforts.
[+] [-] canadacow|3 years ago|reply
[+] [-] kkielhofner|3 years ago|reply
[+] [-] allanrbo|3 years ago|reply
[+] [-] cobertos|3 years ago|reply
Even just the time it takes speed of light between NY and LA (410^6m/310^8m/s=1/75s) is roughly how long a 60 fps frame is (1/60s). Add OS serializing the frame from the GPU onto the network card, network switching of those packets, and you're starting to really feel that latency.
[+] [-] capableweb|3 years ago|reply
[+] [-] taf2|3 years ago|reply
[+] [-] delijati|3 years ago|reply
[+] [-] lmeyerov|3 years ago|reply
Definitely of interest to us, even w/ latency limits, both for ai dev & investigations and occasional full runs
I do have to wonder about the non-oss licensing, as that's part of why we didn't spend much time on bitfusion...
[+] [-] fock|3 years ago|reply
Also for most real GPU applications, you need to get the data in and out. I don't think splitting compute across a (insert any non-Infiniband-link) solves this
[+] [-] Melatonic|3 years ago|reply
[+] [-] miovoid|3 years ago|reply
[+] [-] xrd|3 years ago|reply
[+] [-] en4bz|3 years ago|reply
[+] [-] jasonni|3 years ago|reply
[+] [-] zamadatix|3 years ago|reply
[+] [-] nimitt|3 years ago|reply
[+] [-] stevegolik|3 years ago|reply
[+] [-] ridgered4|3 years ago|reply
[+] [-] sworley|3 years ago|reply
[+] [-] ApolloRising|3 years ago|reply
[+] [-] dezmou|3 years ago|reply
[+] [-] neuronexmachina|3 years ago|reply
[+] [-] imhoguy|3 years ago|reply
[+] [-] en4bz|3 years ago|reply
[+] [-] cwbaker400|3 years ago|reply
[+] [-] Mo3|3 years ago|reply
[+] [-] Avlin67|3 years ago|reply
[+] [-] yangikan|3 years ago|reply
[+] [-] unknown|3 years ago|reply
[deleted]