Introducing architecture variants

mobilio|4 months ago

Announce was here: https://discourse.ubuntu.com/t/introducing-architecture-vari...

and key point: "Previous benchmarks we have run (where we rebuilt the entire archive for x86-64-v3 57) show that most packages show a slight (around 1%) performance improvement and some packages, mostly those that are somewhat numerical in nature, improve more than that."

ninkendo|4 months ago

> show that most packages show a slight (around 1%) performance improvement

This takes me back to arguing with Gentoo users 20 years ago who insisted that compiling everything from source for their machine made everything faster.

The consensus at the time was basically "theoretically, it's possible, but in practice, gcc isn't really doing much with the extra instructions anyway".

Then there's stuff like glibc which has custom assembly versions of things like memcpy/etc, and selects from them at startup. I'm not really sure if that was common 20 years ago but it is now.

It's cool that after 20 years we can finally start using the newer instructions in binary packages, but it definitely seems to not matter all that much, still.

juujian|4 months ago

Are there any use cases where that 1% is worth any hassle whatsoever?

dang|4 months ago

Thanks - we've merged the comments from https://news.ycombinator.com/item?id=45772579 into this thread, which had that original source.

horizion2025|4 months ago

How many additions have there even been outside of AVX-x? And even AVX-2 is from 2011. If we ignore AVX-x the last I can recall are the few instructions added in the manipulation sets BMI/ABM, but they are Haswell/Piledriver/Jaguar era (2012-2013). While some specific cases could benefit, doesn't seem like a goldmine of performance improvements.

Further, maybe it has not been a focus for compiler vendors to generate good code for these higher-level archs if few are using the feature. So Ubuntu's move could improve that.

jwrallie|4 months ago

Is it worth it losing the ability to just put your hdd on your older laptop and booting it in an emergency?

pizlonator|4 months ago

That 1% number is interesting but risks missing the point.

I bet you there is some use case of some app or library where this is like a 2x improvement.

theandrewbailey|4 months ago

A reference for x86-64 microarchitecture levels: https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...

x86-64-v3 is AVX2-capable CPUs.

jsheard|4 months ago

> x86-64-v3 is AVX2-capable CPUs.

Which unfortunately extends all the way to Intels newest client CPUs since they're still struggling to ship their own AVX512 instructions, which are required for v4. Meanwhile AMD has been on v4 for two generations already.

zozbot234|4 months ago

What are the changes to dpkg and apt? Are they being shared with Debian? Could this be used to address the pesky armel vs. armel+hardfloat vs. armhf issue, or for that matter, the issue of i486 vs. i586 vs. i686 vs. the many varieties of MMX and SSE extensions for 32-bit?

(There is some older text in the Debian Wiki https://wiki.debian.org/ArchitectureVariants but it's not clear if it's directly related to this effort)

Denvercoder9|4 months ago

Even if technically possible, it's unlikely this will be used to support any of the variants you mentioned in Debian. Both i386 and armel are effectively dead: i386 is reduced to a partial architecture only for backwards compatibility reasons, and armel has been removed entirely from development of the next release.

mwhudson|4 months ago

> Could this be used to address the pesky armel vs. armel+hardfloat vs. armhf issue

No, because those are different ABIs (and a debian architecture is really an ABI)

> the issue of i486 vs. i586 vs. i686 vs. the many varieties of MMX and SSE extensions for 32-bit?

It could be used for this but it's about 15 years too late to care surely?

> (There is some older text in the Debian Wiki https://wiki.debian.org/ArchitectureVariants but it's not clear if it's directly related to this effort)

Yeah that is a previous version of the same design. I need to get back to talking to Debian folks about this.

bobmcnamara|4 months ago

This would allow mixing armel and softvfp ABIs, but not hard float ABIs, at least across compilation unit boundaries (that said, GCC never seems to optimize ABI bottlenecks within a compilation unit anyway)

watersb|4 months ago

Over the past year, Intel has pulled back from Linux development.

Intel has reduced its number of employees, and has lost lots of software developers.

So we lost Clear Linux, their Linux distribution that often showcased performance improvements due to careful optimization and utilization of microarchitectural enhancements.

I believe you can still use the Intel compiler, icc, and maybe see some improvements in performance-sensitive code.

https://clearlinux.org/

"It was actively developed from 2/6/2015-7/18/2025."

dooglius|4 months ago

icc was discontinued FWIW. The replacement, icx, is AIUI just clang plus some proprietary plugins

Hasz|4 months ago

Getting a 1% across the board general purpose improvement might sound small, but is quite significant. Happy to see Canonical invest more heavily in performance and correctness.

Would love to see which packages benefited the most in terms of percentile gain and install base. You could probably back out a kWh/tons of CO2 saved metric from it.

dfc|4 months ago

> you will not be able to transfer your hard-drive/SSD to an older machine that does not support x86-64-v3. Usually, we try to ensure that moving drives between systems like this would work. For 26.04 LTS, we’ll be working on making this experience cleaner, and hopefully provide a method of recovering a system that is in this state.

Does anyone know what the plans are to accomplish this?

dmoreno|4 months ago

If I were them I would make sure the V3 instructions are not used until late in the boot process, and some apt command that makes sure all installed programs are in the right subarchitecture for the running system, reinstalling as necessary.

But that does not sound like a simple for non technical users solution.

Anyway, non technical users using an installation on another lower computer? That sounds weird.

mwhudson|4 months ago

I am probably going to be the one implementing this and I don't know what I am going to do yet! At the very least we need the failure mode to be better (currently you get an OOPS when the init from the initrd dies due to an illegal instruction exception)

unknown|4 months ago

[deleted]

benatkin|4 months ago

There's an unofficial repo for ArchLinux: https://wiki.archlinux.org/title/Unofficial_user_repositorie...

> Description: official repositories compiled with LTO, -march=x86-64-vN and -O3.

Packages: https://status.alhp.dev/

theandrewbailey|4 months ago

A reference for x86-64 microarchitecture levels: https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...

x86-64-v3 is AVX2-capable CPUs.

mananaysiempre|4 months ago

Right, though compared to what one generally thinks of as an “AVX2-compatible” CPU, it curiously omits AES-NI and CLMUL (both relevant to e.g. AES-GCM). Yes, they are not technically part of AVX2, but they are present in all(?) the qualifying Intel and AMD CPUs (like many other technically-not-AVX2 stuff that did get included, like BMI or FMA3).

random29ah|4 months ago

I'm really "new" to x64 (I only migrated from 32-bit in 2020...) and the difference I noticed between x86-64-v1 and x86-64-v3 was only with video (with ffmpeg), audio (mp3/ogg/mp4...) and encryption; the rest remains practically the same.

Naively, I believe it might be more appropriate to have x86-64-v1 and x86-64-vN options only for specific software and leave the rest as x86-64-v1.

AVX seemed to give the biggest boost to things.

Regarding those who are making fun of Gentoo users, it really did make a bigger difference in the past, but with the refinement of compilers, the difference has diminished. Today, for me, who still uses Gentoo/CRUX for some specific tasks, what matters is the flexibility to enable or disable what I want in the software, and not so much the extra speed anymore.

As an example, currently I use -Os (x86-64-v1) for everything, and only for things related to video/sound/cryptography (I believe for things related to mathematics in general?) I use -O2 (x86-64-v3) with other flags to get a little more out of it.

Interestingly, in many cases -Os with -mtune=nocona generates faster binaries even though I'm only using hardware from Haswell to today's hardware (who can understand the reason for this?).

physicsguy|4 months ago

This is quite good news but it’s worth remembering that it’s a rare piece of software in the modern scientific/numerical world that can be compiled against the versions in distro package managers, as versions can significantly lag upstream months after release.

If you’re doing that sort of work, you also shouldn’t use pre-compiled PyPi packages for the same reason - you leave a ton of performance on the table by not targeting the micro-architecture you’re running on.

PaulHoule|4 months ago

My RSS reader trains a model every week or so and takes 15 minutes total with plain numpy, scikit-learn and all that. Intel MKL can do the same job in about half the time as the default BLAS. So you are looking at a noticeable performance boost but zero bullshit install with uv is worth a lot. If I was interested in improving the model than yeah I might need to train 200 of them interactively and I’d really feel the difference. Thing is the model is pretty good as it is and to make something better I’d have to think long and hard about what ‘better’ means.

colechristensen|4 months ago

Most of the scientific numerical code I ever used had been in use for decades and would compile on a unix variant released in 1992, much less the distribution version of dependencies that were a year or two behind upstream.

zipy124|4 months ago

Yup, if you're using OpenCV for instance compiling instead of using pre-built binaries can result in 10x or more speed-ups once you take into account avx/threading/math/blas-libraries etc...

jeffbee|4 months ago

I wonder who downvoted this. The juice you are going to get from building your core applications and libraries to suit your workload are going to be far larger than the small improvements available from microarchitectural targeting. For example on Ubuntu I have some ETL pipelines that need libxml2. Linking it statically into the application cuts the ETL runtime by 30%. Essentially none of the practices of Debian/Ubuntu Linux are what you'd choose for efficiency. Their practices are designed around some pretty old and arguably obsolete ideas about ease of maintenance.

niwtsol|4 months ago

Thanks for sharing this. I'd love to learn more about micro-architectures and instruction sets - would you have any recommendations for books or sources that would be a good starting place?

skywhopper|4 months ago

This sure feels like overkill that leaks massive complexity into a lot more areas than it’s needed in. For the applications that truly need sub-architecture variants, surely different packages or just some sort of meta package indirection would be better for everyone involved.

rock_artist|4 months ago

So if it got it right, This is mostly a way to have branches within a specific release for various levels of CPUs and their support of SIMD and other modern opcodes.

And if I have it right, The main advantage should come with package manager and open sourced software where the compiled binaries would be branched to benefit and optimize newer CPU features.

Still, this would be most noticeable mostly for apps that benefit from those features such as audio dsp as an example or as mentioned ssl and crypto.

jeffbee|4 months ago

I would expect compression, encryption, and codecs to have the least noticeable benefit because these already do runtime dispatch to routines suited to the CPU where they are running, regardless of the architecture level targeted at compile time.

zdw|4 months ago

Many other 3rd party software has already required x86-64-v2 or -v3 already.

I couldn't run something from NPM on a older NAS machine (HP Microserver Gen 7) recently because of this.

stabbles|4 months ago

Seems like this is not using glibc's hwcaps (where shared libraries were located in microarch specific subdirs).

To me hwcaps feels like a very unfortunate feature creep of glibc now. I don't see why it was ever added, given that it's hard to compile only shared libraries for a specific microarch, and it does not benefit executables. Distros seem to avoid it. All it does is causing unnecessary stat calls when running an executable.

mwhudson|4 months ago

No it's not using hwcaps. That would only allow optimization of code in shared libraries, would be irritating to implement in a way that didn't require touching each package that includes shared libraries and would (depending on details) waste a bunch of space on every users system. I think hwcaps would only make sense for a small number of shared libraries if at all, not a system wide thing.

smlacy|4 months ago

I presume the motivation is performance optimization? It would be more compelling to include some of the benefits in the announcement?

embedding-shape|4 months ago

They do mention it in the linked announcement, although not really highlighted, just as a quick mention:

> As a result, we’re very excited to share that in Ubuntu 25.10, some packages are available, on an opt-in basis, in their optimized form for the more modern x86-64-v3 architecture level

> Previous benchmarks we have run (where we rebuilt the entire archive for x86-64-v3 57) show that most packages show a slight (around 1%) performance improvement and some packages, mostly those that are somewhat numerical in nature, improve more than that.

pushfoo|4 months ago

ARM/RISC-V extensions may be another reason. If a wide-spread variant configuration exists, why not build for it? See: - RISC-V's official extensions[1] - ARM's JS-specific float-to-fixed[2]

1. https://riscv.atlassian.net/wiki/spaces/HOME/pages/16154732/... 2. https://developer.arm.com/documentation/dui0801/h/A64-Floati...

westurner|4 months ago

"Gentoo x86-64-v3 binary packages available" (2024) https://news.ycombinator.com/item?id=39255458

"Changes/Optimized Binaries for the AMD64 Architecture v2" (2025) https://fedoraproject.org/wiki/Changes/Optimized_Binaries_fo... :

> Note that other distributions use higher microarchitecture levels. For example RHEL 9 uses x86-64-v2 as the baseline, RHEL 10 uses x86-64-v3, and other distros provide optimized variants (OpenSUSE, Arch Linux, Ubuntu).

sluongng|4 months ago

Nice. This is one of the main reasons why I picked CachyOS recently. Now I can fallback to Ubuntu if CachyOS gets me stuck somewhere.

yohbho|4 months ago

CachyOS uses this one percent of performance gains? Since it uses every performance gain, unsurprising. But now I wonder how my laptop from 2012 did run CachyOS, they seem to switch based on hardware, not during image download and boot.

ElijahLynn|4 months ago

I clicked on this article expecting an M series variant for Apple hardware...

zer0zzz|4 months ago

There was a fat elf project to solve this problem at one point I thought.

DrNosferatu|4 months ago

Link?

whalesalad|4 months ago

> means to better exploit modern processors without compromising support for older hardware

very odd choice of words. "better utilize/leverage" is perhaps the right thing to say here.

JohnKemeny|4 months ago

"exploit": make full use of and derive benefit from

malkia|4 months ago

This is awesome, but ... If you process requires deterministic results (speaking about floats/doubles mostly here), then you need to get this straight.

tommica|4 months ago

Once they have rebuilt with rust, they get to move away from GPL licenses and get to monetize things.

brucehoult|4 months ago

So now they can support RISC-V RVA20 and RVA23 in the same distro?

All the fuss about Ubuntu 25.10 and later being RVA23 only was about nothing?

snvzz|4 months ago

They sure can, but it seems they simply did not want to.

wyldfire|4 months ago

Would we have something like aarch64 neon/SVE too?

justahuman74|4 months ago

If this goes well - will they do v4 as well?

jnsgruk|4 months ago

Maybe - likely we’ll trade-off the added build/test/storage cost of maintaining each variant - so you might not see amd64v4, but possibly amd64v5 depending on how impactful they turn out to be.

The same will apply to different arm64 or riscv64 variants.

mwhudson|4 months ago

Probably not v4 unless AVX512 becomes more ubiquitous than it looks like it will. But yeah, I don't expect this to be the only variant ever.

amelius|4 months ago

Can we please have an "apt rollback" function?

julian-klode|4 months ago

Yes sure

apt (3.1.7) unstable; urgency=medium . [ Julian Andres Klode ] * test-history: Adjust for as-installed testing . [ Simon Johnsson ] * Add history undo, redo, and rollback features

riskable|4 months ago

If you're using btrfs, you do get that feature: https://moritzmolch.com/blog/2506.html

o11c|4 months ago

That fundamentally requires a snapshot-capable filesystem, so you need to use a distro designed around such.

shmerl|4 months ago

Will Debian do it?

bmitch3020|4 months ago

https://wiki.debian.org/ArchitectureVariants

lotfi-mahiddine|4 months ago

[deleted]

unknown|4 months ago

[deleted]

145 comments