top | item 31996529

GVM: A GPU Virtual Machine for IOMMU-Capable Computers

102 points| ArcVRArthur | 3 years ago |docs.linux-gvm.org

53 comments

order
[+] evol262|3 years ago|reply
This summary page is absolutely terrible.

It appears to be an open-source implementation of nvidia-cli for managing mdevs, which is arbitrarily nice, but it's not clear to end-users what this means.

Pre-Ampere GPUs (Ampere+ is MIG) were able to use the mediated device subsystem to partition cards into time slices. It's similar to SR-IOV, except that you can specify the size of the partition with more granularity than "give me a new virtual device".

Intel has GVT-g, which is reasonably widely supported on Gen12 and above (edit: too late/early -- Gen11 and earlier, not Gen12 and later -- thanks to my123 for the correction). nVidia had vGPU/mdev, and newer generations (Ampere and later) use MIG. It's unclear whether this supports MIG at all. AMD uses MxGPU, and they've never really cared about/pursued anything related to this, probably because their datacenter penetration is about 1%.

MxGPU is only supported on some FirePro cards. mdev was largely on GRID cards (mostly Tesla, some Quadros). MIG is on AXXX cards.

It's unclear why anyone should use this over mdevctl, which already supports GVT-g, and it's also unclear whether this is tied to the (very much "don't use in production") open source nvidia drivers.

For end-users, GVT-g, getting a cheap older GRID card, or using Looking Glass for your GPU are all more reasonable options.

This effort is great, but the readme is appallingly short on information even for someone who knows the problem domain.

[+] my123|3 years ago|reply
> Pre-Ampere GPUs (Ampere+ is MIG)

More complex than that. MIG covers compute use cases only. For workloads where graphics are needed (or even the graphics APIs), you have to use preemptive scheduling, even on Ampere.

> which is reasonably widely supported on Gen12 and above

No. It's gone on Gen11 and later. :/ And no replacement yet.

> MxGPU is only supported on some FirePro cards

Forget about AMD cards for GPU virtualisation. The modern GIM drivers aren't public. So you're stuck with very old out of support GPUs.

> and it's also unclear whether this is tied to the (very much "don't use in production") open source nvidia drivers.

The OSS NV KM stack doesn't support GPU virt at all yet.

[+] stuaxo|3 years ago|reply
Thanks, I came here looking for an explanation, I had no idea what it was after reading the Readme, even though I'd already heard of some of the related tech.
[+] kaladin-jasnah|3 years ago|reply
Another option is getting any Kepler (GK107/GK104) series card (~$10) and running Xen and an unlocker.

GVT-g or a new GPU is probably a better solution, though. Are you sure GVT-g is Gen12 though? I always thought Intel switched to SRIOV by then.

[+] nraynaud|3 years ago|reply
ouch, I understood that it was a soft GPU.
[+] shmerl|3 years ago|reply
What limits it to Nvidia? And is it using SR-IOV?
[+] ArcVRArthur|3 years ago|reply
We do support other GPUs but currently that's through our LibVF.IO pathway (currently supports Nvidia, Intel, and some AMD GPUs):

https://arccompute.com/blog/libvfio-commodity-gpu-multiplexi...

A full list of supported devices is available here:

https://openmdev.io/index.php/GPU_Support

There are some limitations involved in LibVF.IO such as pre-defined mdev types. GVM is entirely free/libre open source software and it supports arbitrary mdev types. :)

We'll do our best to add more supported vendors to GVM/mdev-gpu in the near future.

[+] kaladin-jasnah|3 years ago|reply
NVIDIA's vGPU technology has been cracked to work on most Maxwell and later devices. AMD and Intel have not been cracked and researched in the same ways.
[+] rob_c|3 years ago|reply
no AMD, why for Intel and doesn't cover 90% of nvidia products most people will have (unles they own and run a datacenter)... which leads me to ask...

why even bother?

[+] ncmncm|3 years ago|reply
This seems to be at the wrong level of abstraction.

I want a virtual Vulkan, one per CPU VM. I think I read somebody was working on that, or something like. That way it works on any GPU, not just NVidia or NVidia/Intel.

[+] evol262|3 years ago|reply
Looking Glass is kind of the closest you'll get for a trivial "I want shared GPU virtualization on my workstation", but GPU partitioning doesn't really work that way. Outside of the baseline support in the hardware itself, which is nowhere near generic enough for "works on any GPU" (it took supervisory frameworks to even get CUDA/OpenCL/etc to a point where you can stop worrying about writing transforms from scratch and just let PyTorch abstract it a little), this model of GPU partitioning doesn't perform well.

How do you allocate vGPU memory between a ML/AI VM, a VDI VM, and a gaming/CAD VM? All have dramatically different requirements. You also can't think of shader/GPU cores in any way similar to CPU cores. They're essentially just vector/linear algebra accelerators with little to no branch prediction, speculative execution, or anything else you'd expect.

Otherwise, you can sort of follow along here: https://openmdev.io/index.php/GPU_Support

There's an effort, but it's far from where you want, and there's no indication it will get there unless you can get all the vendors to agree on a standard at some point in the future.