top | item 29005573

Notes from the Meeting on Python GIL Removal Between Python Core and Sam Gross

248 points| rbanffy | 4 years ago |lukasz.langa.pl | reply

98 comments

order
[+] ctoth|4 years ago|reply
I can't help but read all the interjections from the core developers as a strong indication of why all these sorts of things tend to fail on the vine. From Unladen Swallow onward there have been these groups off doing interesting and awesome experiments to try and make Python faster, and they never actually make it into something that Python end-users can use (yes, I know US had shortcomings). This is the work of one (admittedly super smart) guy over two years. Just like Nuitka, something else that was supposed to be impossible but just keeps making steady progress. Maybe I'm just seeing design by committee?

The lone geniuses can only go so far. Maybe the problem is that Python can't quite decide what it wants to be because it's too many things for too many people already.

Is the concept of Python the language, as opposed to Python the ecosystem, valuable enough so that a Python that broke backwards compatibility with all the C extensions would be useful as its own multicore-capable runtime? PyPy seemed to think so for a while and now has gone hard in the other direction, reimplementing (faking?) a bunch of the CPython extension API so maybe this approach would never work. I don't know, but seeing things like:

> there is a large number of “dark matter” Python (and C extension) code out there that isn’t open-source. We need to be careful not to break it since it might not be feasible for its users to make required changes, or to report problems back upstream to us. In particular, some C extensions protect their own internal state with the GIL. This is a big worry, and might be a big hindrance to adoption of a GIL-free Python.

really make me wonder if the community as a whole would conclude the same on these critical sorts of decisions which shape the future of the language if they were put forward and not just made by a couple people in a closed meeting.

Would you prefer to support some weird arbitrary nameless closed source extensions, or have a multicore Python? This obviously depends on who you are and what you're doing, which leads us back to Python being too much for too many, but even here we can get a feeling for how many people do what with the language.

[+] shepardrtc|4 years ago|reply
> Would you prefer to support some weird arbitrary nameless closed source extensions, or have a multicore Python?

There's nothing wrong with staying on an older LTS version of Python. Let the people with the nameless closed-source stuff stick with that. The beauty of open source is that they can fork the older, GIL-ful version of Python and maintain it, if they like.

Multicore would be a tremendous boon to the language.

[+] pganssle|4 years ago|reply
> I can't help but read all the interjections from the core developers as a strong indication of why all these sorts of things tend to fail on the vine.

I think this might be a misunderstanding of the nature of the event that these notes are generated from, unless I'm misunderstanding your objection. The point of this Q&A as I saw it was to explore the feasibility of the idea and fully flesh out the costs and benefits so that we can make informed decisions about how to proceed.

The "random interjections" are notes of caution about what trade-offs need to be made. For example, it is very easy to overlook "dark matter" code because we don't have access to it, but it's almost certainly the majority of Python code out there. It is also not a complete deal-breaker to say that some change could break unknown proprietary extensions — otherwise we'd never be able to change anything; the key is that the changes have to be worth it. A lot of that depends on details — if it's easy to update C extensions for nogil mode (even if they were designed without parallelism in mind), then making breaking changes to remove the GIL might not be so bad. If nogil mode requires that most C extensions totally overhaul their reference counting and C API usage and the changes require restructuring code rather than something that can be done with automated search and replace, that's a much bigger cost and will probably come with a long term fork of the ecosystem (which is a huge pain to deal with) and it might not be worth it.

Avoiding this sort of criticism will not make the underlying problems go away, and I think everyone involved understood that this meeting was intended to bring to light any objections that might guide the work towards ultimate resolution.

[+] simonh|4 years ago|reply
I don't think it's very coherent to criticise the team because "Python can't quite decide what it wants to be", but also criticise them for not adopting all these crazy cool changes that would fundamentally change Python.

Taking a highly conservative approach to breaking changes is absolutely not the same thing as being indecisive. The Python team has learned from experience how disruptive breaking changes can be.

[+] rbanffy|4 years ago|reply
> really make me wonder if the community as a whole would conclude the same on these critical sorts of decisions which shape the future of the language if they were put forward and not just made by a couple people in a closed meeting.

I for one couldn't care less if some proprietary binaries fail on Python 3.11 or so. That's why we keep multiple versions around (at last company, I could only use up to 3.6 because that was the version in the Sacred CentOS AMI)

And, of course, a very critical piece of code was depending on a bug in regex that was fixed in 3.8 or so, and decided to break during a demo (where I was using 3.9 instead of 3.6).

[+] formerly_proven|4 years ago|reply
> there is a large number of “dark matter” Python (and C extension) code out there that isn’t open-source. We need to be careful not to break it since it might not be feasible for its users to make required changes, or to report problems back upstream to us. In particular, some C extensions protect their own internal state with the GIL. This is a big worry, and might be a big hindrance to adoption of a GIL-free Python.

Probably doesn't work across minor versions anyway, most stuff isn't built against the limited API.

[+] bigdict|4 years ago|reply
If this proof of concept is accepted, Facebook will put more resources to move the project forward.
[+] frazbin|4 years ago|reply
I think there's a pretty good chance this stuff gets incorporated:

"On a personal level, we are impressed by Sam’s work so far and invited him to join the CPython project. I’m happy to report he is interested, and to help him ramp up to become a core developer, I will be mentoring him. Guido and Neil Schemenauer will help me review code for the interpreter bits I’m unfamiliar with."

[+] blackandsqueaky|4 years ago|reply
12 references to people in one statement, 5 referring to the post author, 1 reference to social fraternity membership, 1 statement of authority.

I'm not sure if there is a common name for this particular source of discomfort, but that quote definitely contains a lot of it. I'm a historical contributor to the Python source repository, but something about the social structure of the project has changed significantly in recent years that would dissuade me from submitting changes in future. The focus in the statement above no longer feels like it is on the actual productive output of the project itself, and in previous years it wasn't like that, nor needed to be like that.

Reminds me of something like the minutes of a professional schmoozer's business lunch, rather than a technical meeting, or something like that. If you have ever seen a stray engineer at an event like this (or had the misfortune of being that engineer), this feeling probably captures the problem well. Whatever it is, I'd love to see less of it.

[+] caffzz|4 years ago|reply

[deleted]

[+] mvanveen|4 years ago|reply
Just came in here briefly to opine that there is a very real risk of fork if the Python core community does not at least offer a viable alternative expediently.

The economic pressures surrounding the benefits of gross’s changes will likely influence this more than any tears shed over subtle backwards incompatibility.

I believe it was Dropbox that famously released their own private internal Python build a while back and included some concurrency patches.

Many teams might go the route of working from Sam Gross’ work and if we see subtle changes in underlying runtime concurrency semantics or something else backwards incompatible that’s it- either that adoption will roll downhill to a new standard or Python core will have to answer with a suitable GIL-less alternative.

I for one do not want to think about “ANSI Python” runtimes or give the MSFTs etc of the world an opening to divide the user base.

[+] yjftsjthsd-h|4 years ago|reply
I mean, PyPy is over a decade old now, and micropython is a mere 7 years old. What's another fork? If anything, I strongly prefer languages that have more than one implementation.
[+] rbanffy|4 years ago|reply
> I believe it was Dropbox that famously released their own private internal Python build a while back and included some concurrency patches.

Google also had their Unladen Swallow version, but it seems they lost interest at some point.

[+] qwerty456127|4 years ago|reply
Isn't there enough Python forks/implementations already? Is it reasonable to expect a new one to become more popular than those we already have?
[+] wyldfire|4 years ago|reply
> there is a very real risk of fork

It might not matter much if Canonical or IBM decided to port a critical mass of open source extensions/packages. Then they could ship the new CPython in place of the old one and mention the differences in the release notes. With one or both throwing their weight behind it, it would gain significant momentum above and beyond the original project.

[+] fulafel|4 years ago|reply
There experimental forks don't aim to change language semantics so they are quite safe from fragmentation pov even if they accidentally get some adoption. But they have so far been explicit about being research and uninterested in anything else.
[+] mixedmath|4 years ago|reply
These are exceptionally clear notes. They're easy to read and feel comprehensive. I also note that the author is the current CPython developer in residence (a recently created position).
[+] didip|4 years ago|reply
I think Python core developers should not worry about closed-source enterprise Python code.

If users are gaining performance, they will bend over backward porting their code to this new version.

At minimum, I predict, all FAANGMULA would jumped in the bandwagon and create a pretty big ripple effect.

[+] josefx|4 years ago|reply
I am surprised that closed source is suddenly an issue when it comes to the GIL, but half the world breaking on the python 3 transition was not only intended but actively pushed by various members of the community. Since Linux managed to get rid of the BLK then python should be able to get rid of the GIL.
[+] stjohnswarts|4 years ago|reply
They said they would make the non-GIL version opt in (command line flag?) so they wouldn't be breaking the old stuff anyway. It's a solved problem. If anyone moving to a new version of python can't take the time to understand such a small change, then that's on them.
[+] aserafini|4 years ago|reply
100% agree, it is also impossible to quantify so can never really be a useful factor in decision making.

And Python is not a business with customers. It’s an open source volunteer project.

[+] rolisz|4 years ago|reply
I have heard of FAANG. I guess M stands for Microsoft. Who's ULA?
[+] kzrdude|4 years ago|reply
The site is currently past its hosting limit for the day.

https://archive.md/Zb8p2

(Archived through google cache, so two layers of cache.)

[+] ambivalence|4 years ago|reply
Thanks for the link. I just moved the website to Netlify from Fastmail. Hopefully the DNS on your side will update soon.
[+] jancsika|4 years ago|reply
Suppose a Python module written in C registers a method that ends up making a call to C functions larry(), moe(), and then curly() to mutate a global variable "global_mutable_temp" before finally returning a value generated from global_mutable_temp.

1. Supposing this method doesn't currently crash under GIL python, would it be true that this method will also run without crashing on the non-GIL python interpreter?

2. Would it be true that the non-GIL python interpreter will introduce a race to this method (resulting in a runtime error) that didn't exist under the GIL interpreter?

[+] bionhoward|4 years ago|reply
No, yes, and that code should be rewritten, GIL or not
[+] bigdict|4 years ago|reply
Why is it never mentioned that Sam the lone genius Gross who is "interested in PyTorch training workflows" is a fulltime Facebook employee?

Not that it's bad but it should be mentioned that it's a corporate initiative.

[+] minhazm|4 years ago|reply
Why is that relevant? Many of the people contributing significantly to CPython also have full time jobs elsewhere. Sometimes their full time job overlaps with their contributions and sometimes it does not. For example Guido works for Microsoft and is working on CPython performance there, does that mean all of his work needs an asterisk saying it's actually a Microsoft corporate initiative?
[+] Lammy|4 years ago|reply
PyTorch seems to obfuscate its Facebook ownership in general. At least I could find no mention of it on their "About" page or in the documentation, where the only mention of "Facebook" is a link in the footer to the PyTorch project's own social media page: https://pytorch.org/features/

At least the "Brand Guidelines" PDF makes it clear that “PyTorch, the PyTorch logo and any related marks are trademarks of Facebook, Inc.”: https://pytorch.org/assets/brand-guidelines/PyTorch-Brand-Gu...

Also a small hint in `CONTRIBUTING.md`: https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING....

[+] fulafel|4 years ago|reply
There's funding and then there's research. I'd share headline credit about initiative to the person as well.
[+] kombine|4 years ago|reply
I think switching the version to 4 is the most viable path. Make the last GILed python 3 an LTS release that interested parties can hold on to, eg. Ubuntu can keep python 3 as a default for a long time. One can always use conda to run multiple versions simultaneously.
[+] IshKebab|4 years ago|reply
You mean Python 4? That would be absolutely insane. Have you already forgotten the 2/3 transition? They aren't making that mistake again.
[+] mihaic|4 years ago|reply
Besides the internal politics, I think the greatest blocker with improving python performance is not giving whomever is running the code any control. If I don't use any naughty C plugins, or if I can assure you that I've annotated all my types and can allow a large class of optimizations, why can't I run my code with some VM flags?

It's one language, but why can't I tune my VM to my needs? I can't imagine Java not letting users tune their GC.

[+] XorNot|4 years ago|reply
I wonder if a midpoint for this sort of work would be if a major distro or several declared they would move to GIL-free Python?

System python at least is generally only recommended to be used for system libs, and that's a relatively supportable set. Developers use virtualenv's and their own specific interpreter, but it would certainly move the needle on what language people were by default scripting and thinking in.

[+] kzrdude|4 years ago|reply
This here: "The GIL will still be optionally available as an interpreter startup-time option" seems like a midpoint. Maybe it will even be GIL-by-default for some versions.
[+] gadrev|4 years ago|reply
> What’s the level of perceived risk that the nogil project will end up not being viable for inclusion in CPython? (...)

> It all depends on how well the community adapts C extensions so they don’t cause downright crashes of the interpreter. Then, the remaining long tail is community adopting free threads in their applications in a way that is both correct and scales well. Those two are the biggest challenges but we have to be optimistic.

Even if it's 10% of the mess the path py2->py3 was, it still worries me. I hope I'm wrong and it's much less than that (for the fatal cases ATL, and similar/non improved perf for the rest)

[+] jokoon|4 years ago|reply
I remember that the python SQLite module is problematic because of the GIL. Would that solve that problem? What other sort of problem does it solve?

But on the other hand, isn't simple to just dedicate a script per core?

[+] antman|4 years ago|reply
TLDR of my understanding:

What python commitee considered as infeasible was almost done by a lone hero. Since the previous decision to change the format of the print function (that no one asked for) broke everybody's code for no reason and took ten years to be adopted, they will not push (for the one change everyone wants) for the foreseable future. Although it does not seem to be that impossible after all.

I am glad they will invite Sam, the lone gero we needed and hope he will be given some ownership of the task and not get him swamped with commiteeisms through a embrace, not extend and extinguish. He is on a success path, the commitee is in no path at all.

Just put a timeline and call it a fail if don't succeed and quit avoiding it through "discussions". We got it, it's not planned for the X.XX+1 version, each version.

[+] landmark3|4 years ago|reply
and here I am writing a CRUD app with flask :(

Impressive work (2 years working full time knowing that it might never be merged is incredible)

[+] csmpltn|4 years ago|reply
Python is used by a lot of "semi-technical" people (data scientists, researchers, hobbyists, etc). Removing the GIL isn't going to make their life any easier.
[+] chrisseaton|4 years ago|reply
Are these people writing a lot of low-level multi-threaded code?