top | item 45406573

VMScape and why Xen dodged it

123 points| plam503711 | 5 months ago |virtualize.sh

38 comments

order

transpute|5 months ago

On HP business PCs, Xen's microkernel architecture was extended for copy-on-write nested virtualization microVMs (VM per browser tab or HTTP connection) and UEFI-in-VM, https://www.platformsecuritysummit.com/2018/speaker/pratt/ | https://news.ycombinator.com/item?id=42282053#42286147

Imminent unification of Android and ChromeOS will likely use a similar h/w nested-virt architecture based on L0 pKVM + L1 KVM hypervisors on Arm devices.

Honda is using Xen, "How to accelerate Software Defined Vehicle" (2025), https://static.sched.com/hosted_files/xensummit2025/93/HowTo...

eigenform|5 months ago

Since everyone is upset about the lack of technical details in the article, I'll try:

The takeaway from that paper (imo, afaict) is that guest userspace can influence indirect predictor entries in KVM host userspace. I don't really know anything about Xen, but presumably it is unaffected because there is no Xen host userspace, just a tiny hypervisor running privileged code in the host context. With KVM, Linux userspace is still functional in the host context.

Presumably, the analogy to host kernel/userspace in KVM is dom0, but in Xen this is a guest VM. If cross-guest cases are mitigated in Xen (like in the case of KVM, see Table 2 in the paper), you'd expect that this attack just doesn't apply to Xen. Apart from there being no interesting host userspace, IBPB/STIBP might be enough to insulate other guests from influencing dom0. If you're already taking the hit of resetting the predictors when entering dom0, presumably you are not worried about this particular bug.

edit: Additional reading, see https://github.com/xen-project/xen/blob/master/xen/arch/x86/...

bayesnet|5 months ago

While it’s interesting that Dom0 avoids Spectre-style branch prediction attacks it’s not clear from TFA exactly why that is so. How does the architecture of the hypervisor avoid an attack that seems to be at the hardware level? From my limited understanding of Spectre and Meltdown, swapping from a monolithic to a microkernel wouldn’t mitigate an attack. The mitigations discussed in the VMscape paper [0] are hardware mitigations in my reading. And I don’t see Xen mentioned anywhere in the paper for that matter.

I guess it’s sort of off topic, but I was enjoying reading this until I got to the “That’s not just elegant — it’s a big deal for security” line that smelled like LLM-generated content.

Maybe that reaction is hypocritical. I like LLMs; I use them every day for coding and writing. I just can’t shake the feeling that I’ve somehow been swindled if the author didn’t care enough to edit out the “obvious” LLM tells.

[0]: https://comsec-files.ethz.ch/papers/vmscape_sp26.pdf

csmantle|5 months ago

I think the author actually meant "Yes, vmscape can leak information on Xen, but only leaks from a miniature Dom0 process." Leaking from an small pool not being a security issue they seemed to consider.

Agreed on the point about hw-level mitigation. The leakage still exists. Containing it in a watertight box is quick and effective, and it does avoid extra overhead. But it doesn't patch the hole.

mikewarot|5 months ago

I think it might be translation from French instead of LLM usage.

While Microkernels are great for overall security, it's also not obvious to me how it helped in this case.

jcjgraf|5 months ago

Please see my other comment where I share more details about VMScape and why Xen is not affected. In short, it is because branch predictor state is flushed when transitioning to Dom0. Indeed, it has nothing to do with type of kernel... And yes, LLMs were at work. The "quote" in the article is not an actual quote...

somat|5 months ago

Maybe this is the problem with LLMs, Using them feels great, But having them be used on you is highly unpleasant.

remix2000|5 months ago

It's not necessarily a sign of AI slop — could be just proper typography! :3

snvzz|5 months ago

The Xen "microkernel" is unfortunately bloated. seL4 is much smaller and runs VMM as an isolated unprivileged task.

VM exceptions are all handled by VMM. A VM escape would still be confined in VMM, which has no higher capabilities than the VM itself. Capabilities are enforced by the formally verified seL4.

BobbyTables2|5 months ago

I don’t quite see what they’re getting at.

Is it just because it’s another VM switch to get to dom0? Seems a bit unlikely…

Xen has a hypervisor for dealing with the low level details of virtualization and uses dom0 for management and some HW emulation.

QEMU/KVM uses the host kernel for the low level details of virtualization and the QEMU userspace portion to do the actual HW emulation.

They’re actually remarkably similar aside from the detail that the Xen hypervisor only juggles VMs but the KVM design involves it juggling other normal processes…

The people praising Firecracker are just turning a blind eye to the 10000+ lines of (really hairy) C code in the kernel doing x86 instruction emulation and the actual hypervisor part.

jcjgraf|5 months ago

Yes, Xen is indeed protected thanks to using Dom0 for running the pendant of Linux's userspace hypervisor (QEMU, fircracker, etc.).This is because transitions to Dom0 lead to a branch predictor flush. See my other comment for more information. As you say, firecracker is equally affected by VMScape as QEMU is...

AtlasBarfed|5 months ago

So this requires the two VMs to be sharing execution on a core? Or perhaps a shared cache? Or would it work across VMs "pinned" to different CPUs?

It's weird to me that cloud hosts aren't absolutely swimming in cores now, but with Intel struggling and AMD somewhat resting on its laurels, which it stupidly did in the Hector Ruiz days, nothing is pushing the envelope. In 2010, fifteen years ago, we had 12 core CPUs.

In 2010 we had a billion or so transistors. In 2020, we had 50 billion. In 2010 we were at 28nm, now we're at 3nm.

We should have 100x the CPUs on die now or more. a thousand x86 cores, god knows how many Arms, and god knows how much you could do with hi-low core counts.

Anyway, what I'm getting at is all of these vulnerabilities across process execution or VM execution could be moot: if the processes were isolated to a core or set of cores, and the VM isolated to its own dedicated branch predictors in its own cores. Then go ahead and do whatever tricks you want. Obviously you don't want hyper-threading.

jcjgraf|5 months ago

Indeed, victim (e.g. userspace hypervisor like QEMU, firecracker, etc) and attacker (e.g. malicious guest) need to run on the same core. But with VMScape this is always give, because a guest runs as the same process as its hypervisor. Before VMScape, developers only isolated different VMs, different processes and supervisor domains from malicious users. VMScape explits a novel threat model.

pjmlp|5 months ago

Modern Windows is already using two VMs as well, or even more if WSL is being used.

Hyper-V is a type 1 hypervisor, when enabled, which is required for many security measures in modern Windows, the first Windows instance is a privileged guest, just like with Xen.

Additionally anyone using WSL 2.0, is running another set of VMs alongside Windows, depending on how many flavours of Linux and containers are configured.

indigodaddy|5 months ago

If anyone was looking there are still some Xen VPS providers around, one of the oldest being Tornado VPS (formerly prgmr.com).

https://tornadovps.com/about

The founders literally wrote the book on xen:

https://nostarch.com/releases/xen.html

RealStickman_|5 months ago

This made me curious to find out reasons why KVM is so much more popular than Xen. I wasn't able to find anything concrete beyond "KVM is the standard and supported by out tooling", which obviously is the case nowadays, but still leaves me wondering what KVM did so much better than Xen when it first released or if this was just a concidence.

floam|5 months ago

I enjoyed seeing what I could do with a tiny tiny (64 MB RAM) NetBSD VPS on prgmr.com back in the day.

yjftsjthsd-h|5 months ago

I guess I don't quite follow. The attack can let an attacker in a normal VM see memory in either the host or a Xen dom0 VM. Why is it less impactful to get memory from the management VM instead of the host?

jcjgraf|5 months ago

VMScape does not allow an attacker to read memory of Dom0 or the host. Dom0 is safe because branch predictor state is flushed when transitioning to Dom0, and the host is secured as it runs as supervisor, while VMScape only targets userspace. See my comment further up for more information.

aborsy|5 months ago

Which is precisely why Qubes OS uses Xen.

bionsystem|5 months ago

Nowadays you can run your VMs inside LXC, SmartOS also run them inside zones by default. I wonder if the same exploits could be used accross the container layer of both technologies or if it would protect from leaks.

hugo1789|5 months ago

Maybe because xen is a type 1 hypervisor in its original meaning and all the other ones are type 2? (yes, ESX(i) doesn't use linux but it also brings its own os on which it runs on top)

jcjgraf|5 months ago

Author of the VMScape paper here.

It's great to see an article highlighting the impact of VMScape on Xen, especially since our paper [1] does not discuss Xen in detail (we only briefly mention it in the blog post [2]).

That said, the article unfortunately lacks technical precision. Some statements are vague, and "our quote" ("According to the ETH team") is misleading, as those are not our words. To be clear: VMScape is not a cross-VM attack. So please treat such summaries with caution.

Here are some clarifications:

The core issue lies in the hardware. On all AMD Zen CPUs, the branch prediction unit cannot natively distinguish between host user, guest-1 user, and guest-2 user domains (newer Intel CPUs can do to some extend). Supervisor domains (host or guest kernel) are protected by the CPU effectively disabling speculative execution in those domains. But because user domains share branch predictor state, execution in one can control speculation in another - the fundamental root of Spectre-BTI. To enforce isolation, predictors must be flushed (IBPB) whenever transitioning between such domains.

On Linux KVM, an IBPB is issued on guest-1 to guest-2 switches and on process switches. However, because a guest runs in the same process as its userspace hypervisor (e.g. QEMU, firecracker, etc), there is no isolation mechanism in place for this transition. VMScape exploits exactly this gap. The mitigation is to add an IBPB on guest to host userspace transitions.

Xen, while also running on the same flawed hardware, is not vulnerable to VMScape. But the reason is not (just) asynchronism. Asynchronism makes exploitation only harder. Instead, the key reason is that the equivalent of Linux's userspace hypervisor runs inside Dom0 on Xen, which is itself "treated like a guest". Because Xen already issues IBPBs between guest transitions, Dom0 is protected from DomU.

Assigning responsibility for vulnerabilities at the hardware–software boundary is inherently challenging and often depends on implicit assumptions about the threat model. VMScape introduces a novel threat model that had not been considered before. Consequently, the responsible entities concluded that the lack of host/guest branch predictor state isolation does not qualify as a hardware issue, since adequate mitigations, such as IBPB, are readily available, but insufficiently used by software.

[1] https://comsec-files.ethz.ch/papers/vmscape_sp26.pdf [2] https://comsec.ethz.ch/research/microarch/vmscape-exposing-a...