top | item 47416486

Python 3.15's JIT is now back on track

483 points| guidoiaquinti | 14 days ago |fidget-spinner.github.io | reply

311 comments

order
[+] mattclarkdotnet|14 days ago|reply
Python really needs to take the Typescript approach of "all valid Python4 is valid Python3". And then add value types so we can have int64 etc. And allow object refs to be frozen after instantiation to avoid the indirection tax.

Sensible type-annotated python code could be so much faster if it didn't have to assume everything could change at any time. Most things don't change, and if they do they change on startup (e.g. ORM bindings).

[+] adrian17|14 days ago|reply
I'm been occasionally glancing at PR/issue tracker to keep up to date with things happening with the JIT, but I've never seen where the high level discussions were happening; the issues and PRs always jumped right to the gritty details. Is there anywhere a high-level introduction/example of how trace projection vs recording work and differ? Googling for the terms often returns CPython issue tracker as the first result, and repo's jit.md is relatively barebones and rarely updated :(

Similarly, I don't entirely understand refcount elimination; I've seen the codegen difference, but since the codegen happens at build time, does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs? With so many opcodes and their specialized variants, how many stencils are there now?

[+] kenjin4096|14 days ago|reply
> I've never seen where the high level discussions were happening

Thanks for your interest. This is something we could improve on. We were supposed to document the JIT better in 3.15, but right now we're crunching for the 3.15 release. I'll try to get to updating the docs soon if there's enough interest. PEP 744 does not document the new frontend.

I wrote a somewhat high-level overview here in a previous blog post https://fidget-spinner.github.io/posts/faster-jit-plan.html#...

> does this mean each opcode is possibly split into two (or more?) stencils, with and without removed increfs/decrefs?

This is a great question, the answer is not exactly! The key is to expose the refcount ops in the intermediate representation (IR) as one single op. For example, BINARY_OP becomes BINARY_OP, POP_TOP (DECREF), POP_TOP (DECREF). That way, instead of optimizing for n operations, we just need to expose refcounting of n operations and optimize only 1 op (POP_TOP). Thus, we just need to refactor the IR to expose refcounting (which was the work I divided up among the community).

If you have any more questions, I'm happy to answer them either in public or email.

[+] rtpg|14 days ago|reply
discussions might be happening on the Python forums, which are pretty active.

https://discuss.python.org/t/pep-744-jit-compilation/50756/8... here's one thing

I do think you can also just outright ask questions about it on the forums and you'll get some answers.

At the end of the day there's only so many people working on this though.

[+] saikia81|14 days ago|reply
have you read the dev mailing list? There the developers of python discuss lots.
[+] sheepscreek|14 days ago|reply
UPDATE: I misunderstood the question :-/ You can ignore this.

I love playing with compilers for fun, so maybe I can shed some light. I’ll explain it in a simplified way for everyone’s benefit (going to ignore the stack):

When an object is passed between functions in Python, it doesn’t get copied. Instead, a reference to the object’s memory address is sent. This reference acts as a pointer to the object’s data. Think of it like a sticky note with the object’s memory address written on it. Now, imagine throwing away one sticky note every time a function that used a reference returns.

When an object has zero references, it can be freed from memory and reused. Ensuring the number of references, or the “reference count” is always accurate is therefore a big deal. It is often the source of memory leaks, but I wouldn’t attribute it to a speed up (only if it replaces GC, then yes).

[+] owaislone|14 days ago|reply
Oh man, Python 2 > 3 was such a massive shift. Took almost half a decade if not more and yet it mainly changing superficial syntax stuff. They should have allowed ABIs to break and get these internal things done. Probably came up with a new, tighter API for integrating with other lower level languages so going forward Python internals can be changed more freely without breaking everything.
[+] scorpioxy|14 days ago|reply
The text encoding stuff wasn't a small change considering what it could break, at least. And remember we're sometimes talking about software that would cost a lot of money to migrate or upgrade. I still maintain some 2.x python code-bases that will be very expensive to migrate and the customer is not willing to invest that money.

Although your general sentiment is something I agree with(if it's going to be painful do it and get it over with), I don't believe anybody knew or could've guessed what the reaction of the ecosystem would be.

Your last point about being able to change internals more freely is also great in theory but very difficult(if not impossible) to achieve in practice.

I don't know. Having maintained some small projects that were free and open source, I saw the hostility and entitlement that can come from that position. And those projects were a spec of dust next to something like Python. So I think the core team is doing the best they can. It was always going to be damned if you do, damned if you don't.

[+] smcl|14 days ago|reply
I cannot believe people are still acting like Python 2->3 was a huge fuck-up and an enormous missed opportunity. When in reality Python is by most measures the most popular language and became so AFTER that switch.

Since the switch we have seen enormous companies being built from scratch. There is no reason for anyone to be complaining about it being too hard to upgrade in 2026

[+] nurettin|14 days ago|reply
The biggest (and worst planned) change was module names. Your imports didn't work, forcing hacks like

    if sys.version_info.major == 2:
        import old
    else:
        import new
Or worse, people used try/except in their imports.
[+] gjvc|14 days ago|reply
yes. it was not a massive shift. it was barely worth the effort.
[+] rslashuser|14 days ago|reply
I'm curious is the JIT developers could mention any Python features that prevent promising JIT features. An earlier Ken Jin blog [1], mentions how __del__ complicates reference counting optimization.

There is a story that Python is harder to optimize than, say, Typescript, with Python flexibility and the C API getting mentioned. Maybe, if the list of troublesome Python features was out there, programmers could know to avoid those features with the promise of activating the JIT when it can prove the feature is not in use. This could provide a way out of the current Python hard-to-JIT trap. It's just a gist of an idea, but certainly an interesting first step would be to hear from the JIT people which Python features they find troublesome.

[1] https://fidget-spinner.github.io/posts/faster-jit-plan.html

[+] rtpg|14 days ago|reply
It's interesting you mention __del__ because Javascript not only doesn't have destructors but for security reasons (that are above my pay grade) but the spec _explicitly prohibits_ implementations from allowing visibility into garbage collection state, meaning that code cannot have any visibility into deallocations.

I think __del__ is tricky though. In theory __del__ is not meant to be reliable. In practice CPython reliably calls it cuz it reference counts. So people know about it and use it (though I've only really seen it used for best effort cleanup checks)

In a world where more people were using PyPy we could have pressure from that perspective to avoid leaning into it. And that would also generate more pressure to implement code that is performant in "any" system.

[+] adgjlsfhk1|14 days ago|reply
The biggest thing is BigInt by default. It makes every integer operation require an overflow check.
[+] kstrauser|14 days ago|reply
Huh, I could imagine that as a set of Ruff rules:

> Using str.frobnicate prevents TurboJit on line 63

[+] vanderZwan|14 days ago|reply
> However, I misunderstood and came up with an even more extreme version: instead of tracing versions of normal instructions, I had only one instruction responsible for tracing, and all instructions in the second table point to that. Yes I know this part is confusing, I’ll hopefully try to explain better one day. This turned out to be a really really good choice. I found that the initial dual table approach was so much slower due to a doubling of the size of the interpreter, causing huge compiled code bloat, and naturally a slowdown.

> By using only a single instruction and two tables, we only increase the interpreter by a size of 1 instruction, and also keep the base interpreter ultra fast. I affectionally call this mechanism dual dispatch.

I really do hope they'll write that better explanation one day because this sounds pretty intriguing all on its own.

[+] oystersareyum|14 days ago|reply
> We don’t have proper free-threading support yet, but we’re aiming for that in 3.15/3.16. The JIT is now back on track.

I recently read an interview about implementing free-threading and getting modifications through the ecosystem to really enable it: https://alexalejandre.com/programming/interview-with-ngoldba...

The guy said he hopes the free-threaded build'll be the only one in "3.16 or 3.17", I wonder if that should apply to the JIT too or how the JIT and interpreter interact.

[+] zarzavat|14 days ago|reply
I continue to believe that free-threading hurts performance more than it helps and Python should abandon it.

Having to have thread safe code all over the place just for the 1% of users who need to have multi-threading in Python and can't use subinterpreters for some reason is nuts.

[+] ekjhgkejhgk|14 days ago|reply
Doesn't PyPy already have a jit compiler? Why aren't we using that?
[+] olivia-banks|14 days ago|reply
As far as I know, PyPy doesn't support all CPython extensions, so pure Python code will probably (very likely) run fine but for other things most bets are off. I believe PyPy also only supports up to 3.11?
[+] hrmtst93837|14 days ago|reply
PyPy isn't CPython.

A lot of Python code still leans on CPython internals, C extensions, debuggers, or odd platform behavior, so PyPy works until some dependency or tool turns that gap into a support problem.

The JIT helps on hot loops, but for mixed workloads the warmup cost and compatibility tax are enough to keep most teams on the interpreter their deps target first.

[+] contravariant|14 days ago|reply
Why shouldn't the reference implementation get JIT? Just because some other implementations already have it is no reason not to. That'd be like skipping list comprehensions because they already exist in CPython.
[+] 3laspa|14 days ago|reply
Because the same people who made a big deal about supporting PyPy and PEP 399 when it was fashionable to do so are now told by their corporations that PyPy does not matter. CPython only moves with what is currently fashionable, employer mandated and profitable.
[+] cpburns2009|14 days ago|reply
PyPy is limited to maintenance mode due to a lack of funding/contributors. In the past, I think a few contributors or funding is what helped push "minor" PyPy versions. It's too bad PyPy couldn't take the federal funding the PSF threw away.
[+] pjmlp|14 days ago|reply
Great to see this going, Python also deserves a JIT, and given that only few bother with PyPy or GraalPy, shipping into the CPYthon is the only way to have less "rewrite into XYZ".

Kudos to those involved into making it happen.

[+] ghm2199|14 days ago|reply
Thanks for all the amazing work! I have Noob question. Wouldn't this get the funding back? Or would that not be preferable way to continue(as opposed to just volunteer driven)?

Like this is a big deal to get a project to a state where volunteers are spun up and actively breaking tasks and getting work done, no? It's a python JIT something I know next to nothing about — as do most application developers — which tells one how difficult this must have been.

[+] ecshafer|14 days ago|reply
What is wrong with the Python code base that makes this so much harder to implement than seemingly all other code bases? Ruby, PHP, JS. They all seemed to add JITs in significantly less time. A Python JIT has been asked for for like 2 decades at this point.
[+] 0cf8612b2e1e|14 days ago|reply
The Python C api leaks its guts. Too much of the internal representation was made available for extensions and now basically any change would be guaranteed to break backwards compatibility with something.
[+] hardwaregeek|14 days ago|reply
For what it’s worth Ruby’s JIT took several different implementations, definitely struggled with Rails compatibility and literally used some people’s PhD research. It wasn’t a trivial affair
[+] stmw|14 days ago|reply
Some languages are much harder to compile well to machine code. Some big factors (for any languages) are things like: lack of static types and high "type uncertainty", other dynamic language features, established inefficient extension interfaces that have to be maintained, unusual threading models...
[+] fleetfox|14 days ago|reply
I can't really talk about Ruby. But PHP is much more static and surface of things you have to care about at runtime is like magnitude smaller and there already was opache as a starting point. And speaking of something like JIT in V8 is of the most sophisticated and complicated ever built. There hasn't been near enough man hours and funding to cpython to make it fair comparison
[+] fluidcruft|14 days ago|reply
(what are blueberry, ripley, jones and prometheus?)
[+] thunky|14 days ago|reply
I always wanted this for Python but now that machines write code instead of humans I feel like languages like Python will not be needed as much anymore. They're made for humans, not machines. If a machine is going to do the dirty work I want it to produce something lean, fast, and strictly verified.
[+] bigstrat2003|14 days ago|reply
> now that machines write code instead of humans

That is not remotely the case for anyone who produces quality work.

[+] zahlman|14 days ago|reply
We got daguerrotypes, and then photographic film, and then digital cameras, along with image editing software, and now AI image generation systems; yet there are still people who go out and apply oil paints to a canvas with natural hair brushes. I'm not willing to lose that.
[+] ddorian43|14 days ago|reply
AI, write me that sqlalchemy clone in <lang>
[+] JodieBenitez|14 days ago|reply
Pretty much my thoughts the other day... now that Codex does the writing, maybe I can finally switch to Go for the web backend stuff without being annoyed by some of its archaisms and gain significant execution performance, while still having a relatively easy to read language.
[+] a3w|14 days ago|reply
Over 100% speedup sound like "the code compiled before you asked the compiler to start working".

`from future import time_travel`

[+] killingtime74|14 days ago|reply
Sorry but the graphs are completely unreadable. There are four code names for each of the lines. Which is jit and which is cpython?
[+] qy-mj|14 days ago|reply

[deleted]

[+] seanw444|13 days ago|reply
Jumping 12 major versions to one that doesn't exist yet must yield quite the performance boost.