12 requests per second: A realistic look at Python web frameworks

[+] imperio59|5 years ago|reply

My experience doing perf optimizations in real world systems with many many people writing code to the same app is a lot of inefficiencies happen due to over fetching data, inefficiencies caused by naively using the ORM without understanding the underlying cost of the query, and lack of actual profiling to find where the actual bottlenecks are (usually people writing dumb code without realizing it's expensive).

Sure, the framework matters at very large scale and the benefits from optimizing the framework become large when you're doing millions of requests a second over many thousands of servers because it can help reduce baseline cost of running the service.

But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.

[+] stevenjohns|5 years ago|reply

There is lots of truth to this. Some ORMs like Django perform joins in very unsuspecting ways.

A simple example is, say, foreign keys. Trying to access the foreign key of an object by doing `book.user.id` does an additional query for the user table to get the ID. It's less known that the id is immediately available by just doing `book.user_id` instead.

I've spent time optimising things like text searches down from 2000+ queries to about ~4, and one of the more noticeable things to me isn't actually the number of joins, rather the SELECT's that take place. Many of these ORMs do a SELECT * unless you explicitly tell them to otherwise, and when dealing with large-ish datasets or on models that have large text fields this translates into significant time taken to serialise these attributes. So you can optimise the query and still have it take a long time until you realise that limiting the initial `SELECT` parameter is probably more efficient than limiting the number of joins.

[+] divbzero|5 years ago|reply

I increasingly lean towards plain SQL over ORMs. It requires greater familiarity with SQL but I prefer that over greater familiarity with ORM-specific syntax that doesn’t translate across frameworks or languages. In addition, you can prototype new queries and profile existing queries in the database and copy-paste directly into your code.

[+] Izkata|5 years ago|reply

> (usually people writing dumb code without realizing it's expensive)

Some years ago, one morning I gave a co-worker a recommendation on how to improve a loop that was unnecessarily hitting database through the Django ORM. He committed the fix that afternoon. Barely an hour later I accidentally reintroduced the exact same slowdown in the exact same loop when adding a different piece of data to it.

Soooo yeah, ORMs can be so simplistic it's too easy to do by accident even if you know exactly what's going on under the hood.

[+] happyflower|5 years ago|reply

ORMs give an affordability to write code faster. That may save time, but instead of saving time, you can also reinvest in better quality and performance. That's up to you and your team.

For instance, Django has prefetch_related and select_related. At almost every Django conference, there's a talk on this topic because it's so important and very underused/overlooked. But these are provided methods of the ORM.

Aside from that, there are wonderful introspection tools such as django-debug-toolbar to view the raw SQL and its performance.

It can be argued that if a solution written in Django hasn't had its database performance introspected with for instance django-debug-toolbar, then the solution isn't done. This is a small step with big rewards.

This introspection can easily identify where raw SQL is useful. But apply it late in process: As a project matures, the costs of converting some queries into raw/hybrid SQL are lower, as the statement is less likely to change. But keep these SQL statements in the models and managers, don't let them spill into views, template tags etc.

[+] jerf|5 years ago|reply

I'm increasingly banging on the drum that web frameworks shouldn't be measured in requests per second but seconds per request. It sounds really impressive to go from 100,000 to 500,000 requests per second. It's somewhat less impresssive if you consider that's going from 10 microseconds to 2 microseconds... if you consider that your real, non-benchmark handler is likely in the dozens of milliseconds.

I've got a couple of web handlers that after quite a bit of work I can legitimately claim will run on the microsecond timeframe... but they're the exception. Generally even a single DB hit across the network, even on the same system, is going to blow right past the web framework's time you're using.

On that note, using this sort of metric, japronto's claimed results smell funny. Even with a 4GHz processor, getting 1,214,440 requests per second on a single core is ~3300 cycles per request. That's less than one cycle per byte in the HTTP request for a reasonable request (with no blocking on any sort of memory request), and that's not counting the TCP itself, any response, or the overhead of switching back and forth between C and Python. I can't see how this is possible without a huge degree of corner cutting; just validating that what you've received is a legal HTTP request, correctly encoded, decoding the fields, etc. is going to eat into that pretty fast, even with all the SSE instructions you may be able to throw at it. (And to emphasize, I'm not saying this is "impossible", just that it requires a lot of corner cutting. I've also got a "web server" out in the wild that handles "web requests" blazingly fast... because it basically ignores the entire web request and shovels out a hard-coded response. Very fast. Not a very good web server.)

[+] vp8989|5 years ago|reply

"Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then."

This seems like a false dichotomy. Avoiding obvious performance mistakes such as the ones you mentioned does not require additional focus that would detract from general building. It just requires that you know what you are doing.

If you are the type of person who makes said mistakes, its unlikely you would ever go back and fix them by "focusing" on performance because the issue is simply that you don't know what you don't know. Likely someone else will come along in future and point out your mistakes to you.

Optimization that actually hinders you from building and requires focus is at the very margins and almost no one is going to those levels in typical "application" code.

[+] closeparen|5 years ago|reply

With Symfony back in the day, you could turn on a little toolbar that would show in the top of your rendered HTML. You could see how many SQL queries were run, what the statements were, how long they took. I think there was some other tracing information as well. This strikes me as a basic and necessary kind of testing to do when developing with an ORM. Perhaps more systematically you could get ORM business into Zipkin or Jaeger, and then have some kind of staging vs. prod or canary vs. prod statistical comparison to see if you’re about to release something dumb. Or maybe simpler to keep generated SQL in unit test assertions. You wouldn’t have to write it, but you would have to read and update it on changes, so you could notice if you were winding up with N+1 queries or a ridiculous join.

[+] jeromenerf|5 years ago|reply

> But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.

It feels like a sensible advice but "optimization", if ever possible, can only get you so far until you need a costly refactoring or rewrite in my experience.

As projects can be very different in context, it is all about what makes a minimal implementation "viable".

[+] devnonymous|5 years ago|reply

> My experience doing perf optimizations... without realizing it's expensive).

Completely agree and this has been my experience ae well. To this I'll also add, inadequate thought put into data modelling. One would have to think lesser about query performance or cost of overfetching if data is modelled around the needs of the system it would serve instead of just modelling real life entries and their relationships as is, straight onto the database.

[+] sriku|5 years ago|reply

A humble request to folks making benchmark or other graphs - please understand that thin coloured lines are not easy to visually parse .. even for folks like me who aren't totally colour blind but have partial red-green colour blindness. At least, the lines can be made thicker so it is easier to make out the colours. Even better, label the lines with an arrow and what they represent.

[+] TickleSteve|5 years ago|reply

Absolutely! I have mild red/green issues and could barely tell those lines apart.

[+] polyrand|5 years ago|reply

Related to ORMs/queries/performance, I have found the following combination really good:

* aiosql[0] to write raw SQL queries and having them available as python functions (discussed in [1])

* asyncpg[2] if you are using Postgres

* Map asyncpg/aiosql results to Pydantic[3] models

* FastAPI[4]

Pydantic models become the "source of truth" inside the app, they are designed as a copy of the DB schema, then functions receive and return Pydantic models in most cases.

This stack also makes me think better about my queries and the DB design. I try to make sure each endpoint makes only a couple of queries. Each query may have multiple CTEs, but it's still only a single round-trip. That also makes you think about what to prefetch or not, maybe I want to also get the data to return if the request is OK and avoid another query.

[0] https://github.com/nackjicholson/aiosql [1] https://news.ycombinator.com/item?id=24130712 [2] https://github.com/MagicStack/asyncpg [3] https://pydantic-docs.helpmanual.io/ [4] https://fastapi.tiangolo.com/

[+] pantsforbirds|5 years ago|reply

The asyncpg library is honestly incredible. I wrote a backfill script that would: 1. dump the rows of a postgres table matching a query (usually a range on the index with a filter or two on other columns) 2. Do some very basic transformation on the rows (few replaces with small regex) 3. Take each transformed row and dump into a rabbitmq queue.

I was using aio-pika for the rabbit queue and asyncpg and was getting a consistent 25k messages/sec for like 200 lines of code.

[+] fokinsean|5 years ago|reply

This sounds like a really cool stack. I'm testing the waters with FastAPI and SQLAlchemy right now, but SQLAlchmey feels like it just gets in the way.

Do you have an example project which uses all of these I could look at?

[+] jononor|5 years ago|reply

Is there any automatic glue between SQL and Pydantic, or is each mapping hand-written?

[+] adsharma|5 years ago|reply

It would be cool to teach the python static type checkers such as pyright and mypy to do the same validation as pydantic, marshmallow.

Then you could use dataclasses and map them to the database via sqlalchemy.

https://github.com/adsharma/dataclasses-sql

Couple of other techniques to speedup python:

* Transpile python to another language (py2many) * Compile a large graphql like query to a single query plan in python which can be accelerated. (Fquery)

Both projects on my github.

[+] BiteCode_dev|5 years ago|reply

I always though it would be a nice alternative to an ORM: having a tool that take a marshmallow/pydantic/whatever model, optionally passing it additional db specific options, then it generates a bunch of sql files you can call with aiosql. The whole things would then let you optionally get the result wrapped in a model if you need to, with ORM like helpers for common CRUD things.

That would have the benefit of the standardized api of an ORM and the flexibility of SQL, without the coupling.

[+] mixmastamyk|5 years ago|reply

Do you need to define your models more than once with these? I'm looking for a single source solution and haven't quite found it yet.

[+] hangtwenty|5 years ago|reply

Thank you for sharing this stack! I'm a Pythonista at heart, recently was trying RxDB + TypeScript, and I was thinking hmm I'll bet I could do something with postgres and Pydantic.

[+] jordic|5 years ago|reply

We do something like this, also we took some inspiration from hashura. Asyncpg it's so fast an ergonomic.

[+] ramraj07|5 years ago|reply

Don't forget that you're paying a huge price using the sqlalchemy orm - https://docs.sqlalchemy.org/en/13/faq/performance.html

If I know an endpoint is going to be hit hard, I forgo trying to use the ORM (except to maybe get the table name from the model obj so some soul can trace it's usage here in the future) and directly do an engine.execute(<raw query>). Makes a huge difference. Next optimization I do is create stored procedures on the database. Only then I start thinking about changing the framework itself.

For folks like me who want to get prototypes off the ground in hours, flask and fastapi are godsend, and if that means I have to worry about serving thousands of requests a second soon thats a happy problem for sure.

[+] vosper|5 years ago|reply

You can also use SQLAlchemy Core, which is an intermediate between the full-blown ORM and running actual strings of SQL. I've had a great experience with Core - I can easily have it output essentially the exact SQL I'd write by hand, but I get many benefits (like the ability to compose queries) that are nicer than dealing with raw SQL.

[+] welder|5 years ago|reply

I'll happily forget that because it's such a small microscopic price that it's moot. You're way better off optimizing the actual query being made, which SQLAlchemy is great at because it doesn't hide the SQL from you. Don't use engine.execute(<raw query>), use SQLAlchemy Core if your endpoint is getting hammered.

[+] burnthrow|5 years ago|reply

To be clear, this is FUD. If you know how to make SQA emit the right SQL, the performance is basically the same as psycopg2 + your custom code, usually better. I've written many high volume SQA services and never once saw SQA per se as the bottleneck.

[+] paulmd|5 years ago|reply

ORMs aren't inherently that heavyweight, as a Java developer I don't have performance concerns about hibernate.

That sounds to me like a Python problem and a "this specific ORM isn't performant" problem, not ORMs being bad as a whole. Python has never been the fastest language (it's far slower than, say, Java) and the GIL really prevents applications from scaling well without multiple instances.

And if you really want a "just load the data for me and do nothing else that incurs a performance hit" approach then you can use stateless objects and the ORM truly becomes just a wrapper around the DB to load and transform the data into an object for you and/or do a raw, whole-object update back to the DB.

[+] nwsm|5 years ago|reply

For me, FastAPI is the sweet spot between performance and quick development.

[+] tnash|5 years ago|reply

Use of ORMs is often a performance choke point. Raw DB queries are often much, much faster. Almost always, the more you abstract, the worse you perform. It's great as a developer but not so great as a user.

[+] pantsforbirds|5 years ago|reply

I honestly would rather just read a SQL query. Almost every developer is familiar with SQL so you can immediately know what is happening vs if you are looking at a code base with an ORM you're not familiar with.

[+] nmfisher|5 years ago|reply

I haven't touched an ORM in over 6 years, but unless they've improved since then, I honestly can't think of a single reason why anyone would choose to use one.

They're clunky monstrosities that act only as guard-rails for inexperienced developers. Far better to invest a few days (which is realistically all you need) to improve their SQL skills and/or code-review practices.

[+] throwdbaaway|5 years ago|reply

Good article, but I can't help but notice a gaping hole in the benchmark -- why was there no attempt to run gunicorn in multi-threaded mode?

The article has a link to https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a..., but failed to mention the key takeaway from the article:

> threaded code got the job done much faster than asyncio in every case

[+] driverdan|5 years ago|reply

The article explicitly says they were testing single threaded use cases.

[+] Laminary|5 years ago|reply

In my benchmark testing, SSL appears to be the bottleneck; e.g., Apache vs. Nginx does not really matter. I assume the benchmarks above 10,000 RPS are not using SSL and regular HTTP? How are people doing benchmarks at 10k-100k RPS?

[+] qeternity|5 years ago|reply

As a Django shop, we’ve always hoped PyPy would one day be suitable for our production deployments but in the end with various issues we were never able to make the switch.

And then Pyston was re-released...and changed everything. It was drop in compatible for us and we saw a 50% drop in latencies.

Source availability aside, I suggest anyone running CPython in prod take a look.

[+] twsted|5 years ago|reply

Can you tell us more about that?

I remember pyston v1 from Dropbox. You are speaking about v2, which is a binary package (closed-source at the moment)?

[+] jaimex2|5 years ago|reply

When you start hitting bottlenecks in your python web framework its probably time to switch to a faster language, not another framework in python.

You're probably done rapid prototyping by this point anyway.

[+] speedgoose|5 years ago|reply

I don't think there is much gains to rewrite everything in a faster language. Unless they are a very very successful company with billions of customers, it's often cheaper to scale horizontally.

[+] square_usual|5 years ago|reply

There's still a good reason to pick fast frameworks in a slow language: you can delay the inevitable for a bit, probably enough time for you to work on a rewrite or whatever.

[+] antihero|5 years ago|reply

Well if you convert everything from ORM to raw SQL, it will then be easier to extract all of that SQL and use it in a different web framework once you've measured and confirmed that your bottleneck is servicing requests in Python.

[+] ximm|5 years ago|reply

Maybe I am missing something, but why wasn't sanic tested with pypy? I expect that this combination would outperform everything else.

[+] cryptos|5 years ago|reply

Why Python at all? About 10 years ago I liked Python a lot (and still like it in principle) and felt very productive compared to, say, Java. Java was full of inconvenience, XML, bloated frameworks and all that. But today you can use Kotlin, that is in my opinion even nicer than Python, with performant frameworks (e. g. Quarkus or Ktor) on the super fast JVM.

I don't want to start a language war, but maybe Python is not the first choice for their requirements.

[+] est|5 years ago|reply

Might as well refer to TechEmpower benchmarks.

https://www.techempower.com/benchmarks/

[+] KaiserPro|5 years ago|reply

We did an evaluation for our API. The API accepts an image upload, passes it onto the backend for processing and returns a ~2k json lump in return.

Long story short, fastapi was much much faster than anything else for us. It also felt a bit like flask. The integration with pydantic for validating dataclasses on the fly was also great.

[+] dfgdghdf|5 years ago|reply

I would question choosing Python for large server projects because the performance ceiling is so low. At least with the "middle tier" performance languages such as Java / C# you are unlikely to require a complete language switch as the project scales.

[+] ancount|5 years ago|reply

I inherited a flask queue worker, and it suffers from some major problems (like 12 req/second when it's not discarding items from the queue). I am primarily a javascript programmer so I'm a little bit out of my element.

I am tempted to refactor the worker to use async features, and that would require factoring out uWSGI, which is fine, I only added it last week. The article states that Vibora is a drop in replacement for flask, but I guess I'm a bit skeptical, as I can't find much information outside of Vibora having a similar api. For a web service with basically one endpoint, I could refactor to another implementation fairly easily, I'm just looking for the right direction.

I thought maybe I should refactor the arch to either batch requests to the worker, or to use async. Anyone have a feeling where I should go? I am just getting started researching this, but any advice would be appreciated.

Edit: at least quart has a migration page.. probably will just try it out, what can I lose? https://pgjones.gitlab.io/quart/how_to_guides/flask_migratio...

Second edit: Also might try out polyrand's stack in the comments.

[+] robertlagrant|5 years ago|reply

Note: SQLAlchemy 1.4 is async. https://docs.sqlalchemy.org/en/14/changelog/migration_14.htm...

[+] maxpert|5 years ago|reply

The fact that you are using offset of 50000 and complaining it slows everything down says a lot about the benchmarks. Top it all with ORM query with prefetch all, GIL, and shared CPU (I am guessing) that you used to run benchmark on. You see where this is headed?

[+] gchamonlive|5 years ago|reply

I have great experiences with Falcon for backend REST APIs, and it is supposed to be great in terms of requests per second.

How does it compare to Sanic?

[+] hgretg3443|5 years ago|reply

C#/ASP.NET is the fastest web framework now:

https://www.techempower.com/benchmarks/#section=test&runid=8...

7.000.000 requests per second

Even GO can only achieve 4.500.000 million requests per secnod being a low-level language, in opposite to high-level C#.

[+] oliwarner|5 years ago|reply

The important thing to remember is that unless you're running a massive service, requests per second is less important than seconds per request.

Getting an API hit from 300ms to 70ms, and proper frontend caching is far more valuable than concurrency (if you can afford to throw servers at it) because it actually affects user performance.

237 comments