GVM: A GPU Virtual Machine for IOMMU-Capable Computers

[+] evol262|3 years ago|reply

This summary page is absolutely terrible.

It appears to be an open-source implementation of nvidia-cli for managing mdevs, which is arbitrarily nice, but it's not clear to end-users what this means.

Pre-Ampere GPUs (Ampere+ is MIG) were able to use the mediated device subsystem to partition cards into time slices. It's similar to SR-IOV, except that you can specify the size of the partition with more granularity than "give me a new virtual device".

Intel has GVT-g, which is reasonably widely supported on Gen12 and above (edit: too late/early -- Gen11 and earlier, not Gen12 and later -- thanks to my123 for the correction). nVidia had vGPU/mdev, and newer generations (Ampere and later) use MIG. It's unclear whether this supports MIG at all. AMD uses MxGPU, and they've never really cared about/pursued anything related to this, probably because their datacenter penetration is about 1%.

MxGPU is only supported on some FirePro cards. mdev was largely on GRID cards (mostly Tesla, some Quadros). MIG is on AXXX cards.

It's unclear why anyone should use this over mdevctl, which already supports GVT-g, and it's also unclear whether this is tied to the (very much "don't use in production") open source nvidia drivers.

For end-users, GVT-g, getting a cheap older GRID card, or using Looking Glass for your GPU are all more reasonable options.

This effort is great, but the readme is appallingly short on information even for someone who knows the problem domain.

[+] my123|3 years ago|reply

> Pre-Ampere GPUs (Ampere+ is MIG)

More complex than that. MIG covers compute use cases only. For workloads where graphics are needed (or even the graphics APIs), you have to use preemptive scheduling, even on Ampere.

> which is reasonably widely supported on Gen12 and above

No. It's gone on Gen11 and later. :/ And no replacement yet.

> MxGPU is only supported on some FirePro cards

Forget about AMD cards for GPU virtualisation. The modern GIM drivers aren't public. So you're stuck with very old out of support GPUs.

> and it's also unclear whether this is tied to the (very much "don't use in production") open source nvidia drivers.

The OSS NV KM stack doesn't support GPU virt at all yet.

[+] unknown|3 years ago|reply

[deleted]

[+] stuaxo|3 years ago|reply

Thanks, I came here looking for an explanation, I had no idea what it was after reading the Readme, even though I'd already heard of some of the related tech.

[+] broknbottle|3 years ago|reply

Intel has already confirmed that GVT-g is essentially dead and not supported on their Iris/Xe or anything newer graphics.. We can also confirm this via their own drivers source..

https://github.com/intel/gvt-linux/blob/gvt-staging/drivers/...

[+] kaladin-jasnah|3 years ago|reply

Another option is getting any Kepler (GK107/GK104) series card (~$10) and running Xen and an unlocker.

GVT-g or a new GPU is probably a better solution, though. Are you sure GVT-g is Gen12 though? I always thought Intel switched to SRIOV by then.

[+] nraynaud|3 years ago|reply

ouch, I understood that it was a soft GPU.

[+] gigatexal|3 years ago|reply

Can I use this to create a kvm vm with part of my GPU to run 3D accelerated things?

[+] ArcVRArthur|3 years ago|reply

Ya, you can definitely do that.

Here's our current install documentation:

https://arccompute.com/blog/libvfio-commodity-gpu-multiplexi...

This will be updated against our GVM/mdev-gpu sources shortly but our current documentation still will get you to what you're asking about. :)

[+] ConstantVigil|3 years ago|reply

Might I kindly ask where the plan for AMD support comes in, if at all?

[+] ArcVRArthur|3 years ago|reply

Sadly AMD GPUs currently have some issues. This post has some details about that:

https://news.ycombinator.com/item?id=28944426

[+] metadat|3 years ago|reply

Can this run a qemu Windows x86 VM?

[+] ArcVRArthur|3 years ago|reply

Yep!

https://arccompute.com/blog/libvfio-commodity-gpu-multiplexi...

[+] shmerl|3 years ago|reply

What limits it to Nvidia? And is it using SR-IOV?

[+] ArcVRArthur|3 years ago|reply

We do support other GPUs but currently that's through our LibVF.IO pathway (currently supports Nvidia, Intel, and some AMD GPUs):

https://arccompute.com/blog/libvfio-commodity-gpu-multiplexi...

A full list of supported devices is available here:

https://openmdev.io/index.php/GPU_Support

There are some limitations involved in LibVF.IO such as pre-defined mdev types. GVM is entirely free/libre open source software and it supports arbitrary mdev types. :)

We'll do our best to add more supported vendors to GVM/mdev-gpu in the near future.

[+] kaladin-jasnah|3 years ago|reply

NVIDIA's vGPU technology has been cracked to work on most Maxwell and later devices. AMD and Intel have not been cracked and researched in the same ways.

[+] rob_c|3 years ago|reply

no AMD, why for Intel and doesn't cover 90% of nvidia products most people will have (unles they own and run a datacenter)... which leads me to ask...

why even bother?

[+] ArcVRArthur|3 years ago|reply

This works on most Nvidia consumer cards (not just datacenter cards): https://openmdev.io/index.php/GPU_Support

I think we have decent coverage of device support: https://store.steampowered.com/hwsurvey/Steam-Hardware-Softw...

We'd like to improve support for AMD devices but there are some issues there that have yet to be resolved.

[+] ncmncm|3 years ago|reply

This seems to be at the wrong level of abstraction.

I want a virtual Vulkan, one per CPU VM. I think I read somebody was working on that, or something like. That way it works on any GPU, not just NVidia or NVidia/Intel.

[+] evol262|3 years ago|reply

Looking Glass is kind of the closest you'll get for a trivial "I want shared GPU virtualization on my workstation", but GPU partitioning doesn't really work that way. Outside of the baseline support in the hardware itself, which is nowhere near generic enough for "works on any GPU" (it took supervisory frameworks to even get CUDA/OpenCL/etc to a point where you can stop worrying about writing transforms from scratch and just let PyTorch abstract it a little), this model of GPU partitioning doesn't perform well.

How do you allocate vGPU memory between a ML/AI VM, a VDI VM, and a gaming/CAD VM? All have dramatically different requirements. You also can't think of shader/GPU cores in any way similar to CPU cores. They're essentially just vector/linear algebra accelerators with little to no branch prediction, speculative execution, or anything else you'd expect.

Otherwise, you can sort of follow along here: https://openmdev.io/index.php/GPU_Support

There's an effort, but it's far from where you want, and there's no indication it will get there unless you can get all the vendors to agree on a standard at some point in the future.

53 comments