top | item 27043217

Cinder: Instagram's performance-oriented fork of CPython

645 points| yagizdegirmenci | 4 years ago |github.com

261 comments

order
[+] sergiomattei|4 years ago|reply
Yes, yes, yes! This is what I've been waiting for so long.

Python has emphasized readability for the reference implementation vs. the practical benefits of better performance for everyone, and the community is really hurting for it.

Please CPython, upstream this or something like it. There reaches a point where this whole zeal about readability becomes idealistic and hurts real-world pragmatic goals. It's hurting the language and we can do better.

Python can be ahead in the performance race. We just need to get real.

[+] brian_herman|4 years ago|reply
I got it to work with the following but errored out on openssl so I couldn't install pip. It builds fine with docker fedora 32. Thank you. Platform Fedora 32, Docker on Windows 10, Version 2004

Steps to Reproduce Install docker from docker.com

   docker run -t -i fedora:32 bash
   git clone https://github.com/facebookincubator/cinder.git
   yum install zlib-devel openssl-devel
   ./configure
   make
   make altinstall

This error is occurs when you do ./configure --enable-optimizations this version of python complies it either way.

   >> Objects/accu.o
   Parser/listnode.c: In function ‘list1node’:
   Parser/listnode.c:66:1: error: ‘/cinder/Parser/listnode.gcda’ profile count data file not 
   found [-Werror=missing-profile]
   66 | }
      | ^
If you get this error it is because you used --enable-optimizations.

github.com/brianherman https://www.linkedin.com/in/brian-herman-092919208/

Edit: Removing github flavored markdown Edit: Answering own question Edit: More formatting

[+] wheybags|4 years ago|reply
Pet peeve: people who put Werror in the default build flags. Leave it off by default, turn it on in your ci!
[+] TheRealPomax|4 years ago|reply
> We've made Cinder publicly available in order to facilitate conversation about potentially upstreaming some of this work to CPython and to reduce duplication of effort among people working on CPython performance.

Nice. Looking forward to seeing that happen over the next few months.

[+] smg|4 years ago|reply
Was Cinder influenced by hhvm (Facebook's vm for php/hack)? A project that maintains a list of different JIT implementations for programming languages and compares them would be a great way to see what are the different approaches to implementing JITs and which language features make it hard to implement performant JITs.

As an aside it is great that the Cinder team is specifically calling out that Cinder is not intended to be used outside of FB. Many people have been burned by lack of community around hhvm.

[+] _carljm|4 years ago|reply
Definitely influenced. There are people on our team who also worked on hhvm.
[+] chrisseaton|4 years ago|reply
> A project that maintains a list of different JIT implementations for programming languages and compares them would be a great way to see what are the different approaches to implementing JITs and which language features make it hard to implement performant JITs.

SOM for example has many implementations with different approaches to compilation http://som-st.github.io.

[+] kungito|4 years ago|reply
I'm afraid there is very little documentation/text on modern production JITters. When I tried finding any text for my MSc I had little success. Does anyone have a suggestion about e.g. .NET 3-tier jitting or similar?
[+] ram_rar|4 years ago|reply
There is a lot of value in leveraging existing programming languages than rewriting in something else. My team rewrote large chunks of code from python -> Go and it wasn't a pleasant experience. We were able to justify it, since it _inadvertently_ reduced our infra cost.

But, if we could make python a lot faster at compiler level, it doesn't break the dev experience and lengthen dev on-boarding time. This helps the team productivity as a whole. I hope the CPython community is able to leverage few things from Cinder. This would benefit the entire python community.

[+] pm90|4 years ago|reply
If you don’t mind sharing, what was it about the experience that made it unpleasant?
[+] antpls|4 years ago|reply
Why didn't you profile the Python code and only rewrited the hot loops?
[+] adsharma|4 years ago|reply
Did you rewrite it by hand or use a transpiler?
[+] csmpltn|4 years ago|reply
> "If a call to an async function is immediately awaited, we immediately execute the called function up to its first await. If the called function reaches a return without needing to await, we will be able to return that value directly without ever even creating a coroutine object or deferring to the event loop. This is a significant (~5%) CPU optimization in our async-heavy workload."

Step 1: "Who needs Threads? Just async/await all the things!"

Step 2: Build complex logic to untangle your nonsensical use of async/await and force things to run synchronously in the backend without the user knowing. See? A 5% CPU optimization right there!

[+] pcwalton|4 years ago|reply
> Step 2: Build complex logic to untangle your nonsensical use of async/await and force things to run synchronously in the backend without the user knowing. See? A 5% CPU optimization right there!

Addressing your snark, I assume this optimization is targeted at places where you have a single "await" expression in the source that might call different functions at runtime, only some of which need to await.

[+] toxik|4 years ago|reply
Asynchronous execution is not a replacement for parallel execution. That is not the value proposition of asynchronous execution models.
[+] nly|4 years ago|reply
This exact optimization has been implemented in C++ coroutines via the 'initial_suspend' coroutine traits hook.

...and those are stackless coroutines under AOT compiled conditions, so it only saves a malloc.

[+] airstrike|4 years ago|reply
Never realized Carl Meyer was at Instagram, but this explains a lot now! His work on Django has always been superb
[+] _carljm|4 years ago|reply
Thank you, that’s very kind of you both. To be clear, this is the work of a team, and many others have contributed a lot more than I have to it.
[+] karlding|4 years ago|reply
Looking through the repo, one change of note that doesn't seem to be documented in the README [0] is that this has an optimization to reduce Copy on Writes in forked workers (under the Py_IMMORTAL_INSTANCES preprocessor guard) by not dirtying pages via refcount changes.

Here's the BPO issue [1] that tracks the upstreaming attempt.

[0] https://github.com/facebookincubator/cinder/blob/9de726349eb...

[1] https://bugs.python.org/issue40255

[+] _carljm|4 years ago|reply
It’s also likely that this change, which is a big win for our prefork production environment, drags down our benchmark results. It makes every incref and decref more expensive.
[+] kingmaker|4 years ago|reply
Good call out! I’ll make sure it ends up in the readme.
[+] wiz21c|4 years ago|reply
FT readme :

> Cinder is not polished or documented for anyone else's use. We don't have the capacity to support Cinder as an independent open-source project

I'm quite surprised by that. I understand that FB makes tons of money so I can't believe they can't have the capacity... Maybe they don't have the will (which I perfectly understand, it's just the wording)

Anyway, it's great they share.

[+] msoad|4 years ago|reply
I love how Facebook and Instagram never went the route of "full rewrite" for their apps as they scaled.

I my experience "Language X is slow and we could save Y switching to Z" is always a false promise. You can pick parts of the system that are costing a lot and port them to other language/frameworks to capture the bulk of savings while keeping your developers happy working on familiar code. Or if you'e big enough like Facebook, you can go see why X is slow and if it is possible to improve at the compiler level. Never disrupt the developers flow (Twitter did this with their Scala craze back in the day)

[+] spullara|4 years ago|reply
If Twitter's port to the JVM didn't result in 10x fewer servers with 10x lower latency vs an identical Ruby implementation we wouldn't have done it. Sadly not much progress has been made on the RubyVM though Twitter did try with improvements to GC and other changes. We even tried JRuby as part of the evaluation.
[+] nemothekid|4 years ago|reply
I'm not sure many companies have the luxury of doing what FB did. I can't imagine a world where choosing to rewrite your app, or choosing to rewrite the entire language, where rewriting the entire language would be the cheaper solution.

Twitter did this with their Scala craze, but Twitter struggled to scale it's Ruby app. Twitter didn't have nearly as deep pockets to write a performant alternative Ruby VM.

[+] re|4 years ago|reply
> keeping your developers happy working on familiar code

That turns pretty quickly into "developers being unhappy that they have to maintain legacy code written for the old language/platform," though, especially with any sort of employee turnover.

[+] mgraczyk|4 years ago|reply
Idk, I worked at IG for a year and wrote a lot of backend code in the Python webserver stack. I believe they would benefit massively by aggressively migrating services to the FB Hack stack.

A ton of work goes into making sure the Python code isn't slow, and there are complicated C++ services built as workarounds for the slowness of python.

[+] neya|4 years ago|reply
I disagree, this logic doesn't work always unless you have investor money lying around. For a bootstrapped business, it could mean living or dying. As an example, a client of mine was paying close to $5000 per month in server costs simply because of the scale of traffic they had on their site. By re-writing Wordpress in Elixir, I was able to bring their costs down to $1000 odd per month. It's actually cheaper than what any of their competitors are paying for, in servers as well.

This $4000 in savings actually allowed them to hire an additional full time developer to maintain their site. So, your logic only works for certain use cases, not all.

[+] joe_the_user|4 years ago|reply
One factor for Twitter was that Ruby was simply orders of magnitude worse for Twitter usage than I think Php or Python was for Facebook or instagram. Or at least one never heard of these providing horror stories where Twitter definitely had horror stories. In one of the few times I've been "close to the source", I heard Twitter's head of operations complaining the electricity had become their primary expense from starting X many Ruby instances per second. It was one instance per connection or something close, because Ruby was unstable and because Ruby never gave memory back to the system.
[+] cageface|4 years ago|reply
"Language X is slow and we could save Y switching to Z" is always a false promise.

The thing that keeps software interesting for me is that there are almost no absolute rules. So I'll agree that complete rewrites are usually a mistake and performance problems are often not in the language.

But I can't agree that rewrites are always a mistake. esbuild is a recent good example of how much difference switching to a faster language can make.

[+] halfmatthalfcat|4 years ago|reply
Did Twitter actually contribute to Scala core though? They definitely created and contributed to various Scala libraries but they didn't fork/augment the Scala compiler like Facebook did with HipHop.
[+] drunkenmagician|4 years ago|reply
Not sure I can agree with that. Sinking resources in to re-engineering a language infrastructure, mostly because the language / infra is not actually a good choice for the target problem is an approach only very large engineering teams can tackle. As has been already been started (further down the thread). Breaking down your code / monolith into manageable components and re targeting for performance in the key spots is almost always the right approach. Unless of course you have the engineering resources and hubris of someone like FB
[+] guenthert|4 years ago|reply
Rewriting in full for the sake of performance gain is indeed a questionable proposal. Rewriting hot spots being the more sensible approach.

A better reason for rewriting however is to migrate to a more maintainable language with focus on readability from one which focus on expressiveness (and hence favored for rapid development) as the project matures.

[+] kevingadd|4 years ago|reply
I think while performance is not a proper justification, if you get other stuff in the bargain it can be really worth it. Rust, for example - the justification to switch is usually not just speed but safety and the safety enabling more optimizations, parallelism etc. If all you care about is speed you stay in C/C++.

Sometimes the choice is also made when you already have to pay a transitional cost - at a previous employer we had to rewrite a bunch of code to move from an old version of PHP to a newer one, and a strong argument was made that at that point we should just adopt a better language since we already had to pay that cost. I think eventually a lot of stuff transitioned to Haskell as part of that process, even if other stuff stayed in PHP.

[+] IceWreck|4 years ago|reply
Benchmarks: https://github.com/facebookincubator/cinder/blob/cinder/3.8/...

There are some perf hits, as well as all the improvements.

[+] _carljm|4 years ago|reply
It’s worth noting that we’ve never targeted perf on any of these benchmarks, so these are just the results that fell out from optimizing for our production workload, which is a totally different beast.

Most of the hits are due to the default “JIT everything” behavior which is really bad if there are a lot of rarely called functions, but would be easy to fix. This is discussed in the README.

[+] rkimb|4 years ago|reply
How does this compare to other interpreters?
[+] joelthelion|4 years ago|reply
This is cool, but at their scale I wonder if they wouldn't be better off rewriting most of their codebase, or at least a performance-critical subset, in something like Rust or Go.

It may seem like a lot of work, but making Python faster isn't exactly trivial either.

Also, I love Python, but using it for huge projects like this sounds dubious at best. I think the benefits of static typing really show when projects grow large.

[+] deadmutex|4 years ago|reply
I wonder if they are in touch with kmod or tried pyston: https://blog.pyston.org/.
[+] _carljm|4 years ago|reply
Haven't tried Pyston, its revival as an active project happened well after Cinder was in production.
[+] kingmaker|4 years ago|reply
We’ve chatted with kmod a few times and let him know we were open sourcing Cinder. Hopefully the projects can learn from each other. As _carljm mentioned, Pyston was restarted way after Cinder was in production and we had already implemented a significant amount of the core JIT functionality.
[+] pjfin123|4 years ago|reply
What's the difference between this and PyPy? They're both JIT/performance oriented Python interpreters?

Good to see more work on this front! It's seemed crazy that Python can be this popular but still not be nearly as fast as it could be.

[+] jarpineh|4 years ago|reply
I wonder if Instagram being big Django site has driven Cinder development? What sort of impact Cinder has on their Django workloads?

Regular models, views, ORM stuff can be fairly generic. Did they need to change Django to better benefit from Cinder?

Edit: intriguing… https://github.com/facebookincubator/cinder/blob/f60897df9f6...

[+] mangecoeur|4 years ago|reply
In many ways nice that they open their performance work, some of which could be upstreamed.

In some ways a bit cheeky that they took the 'dump it and see' approach rather than offering to work to upstream it, since it's kinda outsourcing the work of maintaining the performance improvements to the Python core devs rather than offering to put some of Facebook's considerable resources towards doing it themselves.

[+] BiteCode_dev|4 years ago|reply
>Is this supported?

>Short answer: no.

That's refreshingly honest