My experience doing perf optimizations in real world systems with many many people writing code to the same app is a lot of inefficiencies happen due to over fetching data, inefficiencies caused by naively using the ORM without understanding the underlying cost of the query, and lack of actual profiling to find where the actual bottlenecks are (usually people writing dumb code without realizing it's expensive).
Sure, the framework matters at very large scale and the benefits from optimizing the framework become large when you're doing millions of requests a second over many thousands of servers because it can help reduce baseline cost of running the service.
But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.
There is lots of truth to this. Some ORMs like Django perform joins in very unsuspecting ways.
A simple example is, say, foreign keys. Trying to access the foreign key of an object by doing `book.user.id` does an additional query for the user table to get the ID. It's less known that the id is immediately available by just doing `book.user_id` instead.
I've spent time optimising things like text searches down from 2000+ queries to about ~4, and one of the more noticeable things to me isn't actually the number of joins, rather the SELECT's that take place. Many of these ORMs do a SELECT * unless you explicitly tell them to otherwise, and when dealing with large-ish datasets or on models that have large text fields this translates into significant time taken to serialise these attributes. So you can optimise the query and still have it take a long time until you realise that limiting the initial `SELECT` parameter is probably more efficient than limiting the number of joins.
I increasingly lean towards plain SQL over ORMs. It requires greater familiarity with SQL but I prefer that over greater familiarity with ORM-specific syntax that doesn’t translate across frameworks or languages. In addition, you can prototype new queries and profile existing queries in the database and copy-paste directly into your code.
> (usually people writing dumb code without realizing it's expensive)
Some years ago, one morning I gave a co-worker a recommendation on how to improve a loop that was unnecessarily hitting database through the Django ORM. He committed the fix that afternoon. Barely an hour later I accidentally reintroduced the exact same slowdown in the exact same loop when adding a different piece of data to it.
Soooo yeah, ORMs can be so simplistic it's too easy to do by accident even if you know exactly what's going on under the hood.
ORMs give an affordability to write code faster. That may save time, but instead of saving time, you can also reinvest in better quality and performance. That's up to you and your team.
For instance, Django has prefetch_related and select_related. At almost every Django conference, there's a talk on this topic because it's so important and very underused/overlooked. But these are provided methods of the ORM.
Aside from that, there are wonderful introspection tools such as django-debug-toolbar to view the raw SQL and its performance.
It can be argued that if a solution written in Django hasn't had its database performance introspected with for instance django-debug-toolbar, then the solution isn't done. This is a small step with big rewards.
This introspection can easily identify where raw SQL is useful. But apply it late in process: As a project matures, the costs of converting some queries into raw/hybrid SQL are lower, as the statement is less likely to change. But keep these SQL statements in the models and managers, don't let them spill into views, template tags etc.
I'm increasingly banging on the drum that web frameworks shouldn't be measured in requests per second but seconds per request. It sounds really impressive to go from 100,000 to 500,000 requests per second. It's somewhat less impresssive if you consider that's going from 10 microseconds to 2 microseconds... if you consider that your real, non-benchmark handler is likely in the dozens of milliseconds.
I've got a couple of web handlers that after quite a bit of work I can legitimately claim will run on the microsecond timeframe... but they're the exception. Generally even a single DB hit across the network, even on the same system, is going to blow right past the web framework's time you're using.
On that note, using this sort of metric, japronto's claimed results smell funny. Even with a 4GHz processor, getting 1,214,440 requests per second on a single core is ~3300 cycles per request. That's less than one cycle per byte in the HTTP request for a reasonable request (with no blocking on any sort of memory request), and that's not counting the TCP itself, any response, or the overhead of switching back and forth between C and Python. I can't see how this is possible without a huge degree of corner cutting; just validating that what you've received is a legal HTTP request, correctly encoded, decoding the fields, etc. is going to eat into that pretty fast, even with all the SSE instructions you may be able to throw at it. (And to emphasize, I'm not saying this is "impossible", just that it requires a lot of corner cutting. I've also got a "web server" out in the wild that handles "web requests" blazingly fast... because it basically ignores the entire web request and shovels out a hard-coded response. Very fast. Not a very good web server.)
"Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then."
This seems like a false dichotomy. Avoiding obvious performance mistakes such as the ones you mentioned does not require additional focus that would detract from general building. It just requires that you know what you are doing.
If you are the type of person who makes said mistakes, its unlikely you would ever go back and fix them by "focusing" on performance because the issue is simply that you don't know what you don't know. Likely someone else will come along in future and point out your mistakes to you.
Optimization that actually hinders you from building and requires focus is at the very margins and almost no one is going to those levels in typical "application" code.
With Symfony back in the day, you could turn on a little toolbar that would show in the top of your rendered HTML. You could see how many SQL queries were run, what the statements were, how long they took. I think there was some other tracing information as well. This strikes me as a basic and necessary kind of testing to do when developing with an ORM. Perhaps more systematically you could get ORM business into Zipkin or Jaeger, and then have some kind of staging vs. prod or canary vs. prod statistical comparison to see if you’re about to release something dumb. Or maybe simpler to keep generated SQL in unit test assertions. You wouldn’t have to write it, but you would have to read and update it on changes, so you could notice if you were winding up with N+1 queries or a ridiculous join.
> But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.
It feels like a sensible advice but "optimization", if ever possible, can only get you so far until you need a costly refactoring or rewrite in my experience.
As projects can be very different in context, it is all about what makes a minimal implementation "viable".
> My experience doing perf optimizations... without realizing it's expensive).
Completely agree and this has been my experience ae well. To this I'll also add, inadequate thought put into data modelling. One would have to think lesser about query performance or cost of overfetching if data is modelled around the needs of the system it would serve instead of just modelling real life entries and their relationships as is, straight onto the database.
A humble request to folks making benchmark or other graphs - please understand that thin coloured lines are not easy to visually parse .. even for folks like me who aren't totally colour blind but have partial red-green colour blindness. At least, the lines can be made thicker so it is easier to make out the colours. Even better, label the lines with an arrow and what they represent.
Related to ORMs/queries/performance, I have found the following combination really good:
* aiosql[0] to write raw SQL queries and having them available as python functions (discussed in [1])
* asyncpg[2] if you are using Postgres
* Map asyncpg/aiosql results to Pydantic[3] models
* FastAPI[4]
Pydantic models become the "source of truth" inside the app, they are designed as a copy of the DB schema, then functions receive and return Pydantic models in most cases.
This stack also makes me think better about my queries and the DB design. I try to make sure each endpoint makes only a couple of queries. Each query may have multiple CTEs, but it's still only a single round-trip. That also makes you think about what to prefetch or not, maybe I want to also get the data to return if the request is OK and avoid another query.
The asyncpg library is honestly incredible. I wrote a backfill script that would:
1. dump the rows of a postgres table matching a query (usually a range on the index with a filter or two on other columns)
2. Do some very basic transformation on the rows (few replaces with small regex)
3. Take each transformed row and dump into a rabbitmq queue.
I was using aio-pika for the rabbit queue and asyncpg and was getting a consistent 25k messages/sec for like 200 lines of code.
* Transpile python to another language (py2many)
* Compile a large graphql like query to a single query plan in python which can be accelerated. (Fquery)
I always though it would be a nice alternative to an ORM: having a tool that take a marshmallow/pydantic/whatever model, optionally passing it additional db specific options, then it generates a bunch of sql files you can call with aiosql. The whole things would then let you optionally get the result wrapped in a model if you need to, with ORM like helpers for common CRUD things.
That would have the benefit of the standardized api of an ORM and the flexibility of SQL, without the coupling.
Thank you for sharing this stack! I'm a Pythonista at heart, recently was trying RxDB + TypeScript, and I was thinking hmm I'll bet I could do something with postgres and Pydantic.
If I know an endpoint is going to be hit hard, I forgo trying to use the ORM (except to maybe get the table name from the model obj so some soul can trace it's usage here in the future) and directly do an engine.execute(<raw query>). Makes a huge difference. Next optimization I do is create stored procedures on the database. Only then I start thinking about changing the framework itself.
For folks like me who want to get prototypes off the ground in hours, flask and fastapi are godsend, and if that means I have to worry about serving thousands of requests a second soon thats a happy problem for sure.
You can also use SQLAlchemy Core, which is an intermediate between the full-blown ORM and running actual strings of SQL. I've had a great experience with Core - I can easily have it output essentially the exact SQL I'd write by hand, but I get many benefits (like the ability to compose queries) that are nicer than dealing with raw SQL.
I'll happily forget that because it's such a small microscopic price that it's moot. You're way better off optimizing the actual query being made, which SQLAlchemy is great at because it doesn't hide the SQL from you. Don't use engine.execute(<raw query>), use SQLAlchemy Core if your endpoint is getting hammered.
To be clear, this is FUD. If you know how to make SQA emit the right SQL, the performance is basically the same as psycopg2 + your custom code, usually better. I've written many high volume SQA services and never once saw SQA per se as the bottleneck.
ORMs aren't inherently that heavyweight, as a Java developer I don't have performance concerns about hibernate.
That sounds to me like a Python problem and a "this specific ORM isn't performant" problem, not ORMs being bad as a whole. Python has never been the fastest language (it's far slower than, say, Java) and the GIL really prevents applications from scaling well without multiple instances.
And if you really want a "just load the data for me and do nothing else that incurs a performance hit" approach then you can use stateless objects and the ORM truly becomes just a wrapper around the DB to load and transform the data into an object for you and/or do a raw, whole-object update back to the DB.
Use of ORMs is often a performance choke point. Raw DB queries are often much, much faster.
Almost always, the more you abstract, the worse you perform. It's great as a developer but not so great as a user.
I honestly would rather just read a SQL query. Almost every developer is familiar with SQL so you can immediately know what is happening vs if you are looking at a code base with an ORM you're not familiar with.
I haven't touched an ORM in over 6 years, but unless they've improved since then, I honestly can't think of a single reason why anyone would choose to use one.
They're clunky monstrosities that act only as guard-rails for inexperienced developers. Far better to invest a few days (which is realistically all you need) to improve their SQL skills and/or code-review practices.
In my benchmark testing, SSL appears to be the bottleneck; e.g., Apache vs. Nginx does not really matter. I assume the benchmarks above 10,000 RPS are not using SSL and regular HTTP? How are people doing benchmarks at 10k-100k RPS?
As a Django shop, we’ve always hoped PyPy would one day be suitable for our production deployments but in the end with various issues we were never able to make the switch.
And then Pyston was re-released...and changed everything. It was drop in compatible for us and we saw a 50% drop in latencies.
Source availability aside, I suggest anyone running CPython in prod take a look.
I don't think there is much gains to rewrite everything in a faster language. Unless they are a very very successful company with billions of customers, it's often cheaper to scale horizontally.
There's still a good reason to pick fast frameworks in a slow language: you can delay the inevitable for a bit, probably enough time for you to work on a rewrite or whatever.
Well if you convert everything from ORM to raw SQL, it will then be easier to extract all of that SQL and use it in a different web framework once you've measured and confirmed that your bottleneck is servicing requests in Python.
Why Python at all? About 10 years ago I liked Python a lot (and still like it in principle) and felt very productive compared to, say, Java. Java was full of inconvenience, XML, bloated frameworks and all that. But today you can use Kotlin, that is in my opinion even nicer than Python, with performant frameworks (e. g. Quarkus or Ktor) on the super fast JVM.
I don't want to start a language war, but maybe Python is not the first choice for their requirements.
We did an evaluation for our API. The API accepts an image upload, passes it onto the backend for processing and returns a ~2k json lump in return.
Long story short, fastapi was much much faster than anything else for us. It also felt a bit like flask. The integration with pydantic for validating dataclasses on the fly was also great.
I would question choosing Python for large server projects because the performance ceiling is so low. At least with the "middle tier" performance languages such as Java / C# you are unlikely to require a complete language switch as the project scales.
I inherited a flask queue worker, and it suffers from some major problems (like 12 req/second when it's not discarding items from the queue). I am primarily a javascript programmer so I'm a little bit out of my element.
I am tempted to refactor the worker to use async features, and that would require factoring out uWSGI, which is fine, I only added it last week. The article states that Vibora is a drop in replacement for flask, but I guess I'm a bit skeptical, as I can't find much information outside of Vibora having a similar api. For a web service with basically one endpoint, I could refactor to another implementation fairly easily, I'm just looking for the right direction.
I thought maybe I should refactor the arch to either batch requests to the worker, or to use async. Anyone have a feeling where I should go? I am just getting started researching this, but any advice would be appreciated.
The fact that you are using offset of 50000 and complaining it slows everything down says a lot about the benchmarks. Top it all with ORM query with prefetch all, GIL, and shared CPU (I am guessing) that you used to run benchmark on. You see where this is headed?
The important thing to remember is that unless you're running a massive service, requests per second is less important than seconds per request.
Getting an API hit from 300ms to 70ms, and proper frontend caching is far more valuable than concurrency (if you can afford to throw servers at it) because it actually affects user performance.
[+] [-] imperio59|5 years ago|reply
Sure, the framework matters at very large scale and the benefits from optimizing the framework become large when you're doing millions of requests a second over many thousands of servers because it can help reduce baseline cost of running the service.
But I agree with the author's main point which seems to be that framework performance is pretty meaningless when comparing frameworks if you're just starting on a new project. Focus on making a product people wanna actually use first. If you're lucky enough to get to scale you can work about optimizing it then.
[+] [-] stevenjohns|5 years ago|reply
A simple example is, say, foreign keys. Trying to access the foreign key of an object by doing `book.user.id` does an additional query for the user table to get the ID. It's less known that the id is immediately available by just doing `book.user_id` instead.
I've spent time optimising things like text searches down from 2000+ queries to about ~4, and one of the more noticeable things to me isn't actually the number of joins, rather the SELECT's that take place. Many of these ORMs do a SELECT * unless you explicitly tell them to otherwise, and when dealing with large-ish datasets or on models that have large text fields this translates into significant time taken to serialise these attributes. So you can optimise the query and still have it take a long time until you realise that limiting the initial `SELECT` parameter is probably more efficient than limiting the number of joins.
[+] [-] divbzero|5 years ago|reply
[+] [-] Izkata|5 years ago|reply
Some years ago, one morning I gave a co-worker a recommendation on how to improve a loop that was unnecessarily hitting database through the Django ORM. He committed the fix that afternoon. Barely an hour later I accidentally reintroduced the exact same slowdown in the exact same loop when adding a different piece of data to it.
Soooo yeah, ORMs can be so simplistic it's too easy to do by accident even if you know exactly what's going on under the hood.
[+] [-] happyflower|5 years ago|reply
For instance, Django has prefetch_related and select_related. At almost every Django conference, there's a talk on this topic because it's so important and very underused/overlooked. But these are provided methods of the ORM.
Aside from that, there are wonderful introspection tools such as django-debug-toolbar to view the raw SQL and its performance.
It can be argued that if a solution written in Django hasn't had its database performance introspected with for instance django-debug-toolbar, then the solution isn't done. This is a small step with big rewards.
This introspection can easily identify where raw SQL is useful. But apply it late in process: As a project matures, the costs of converting some queries into raw/hybrid SQL are lower, as the statement is less likely to change. But keep these SQL statements in the models and managers, don't let them spill into views, template tags etc.
[+] [-] jerf|5 years ago|reply
I've got a couple of web handlers that after quite a bit of work I can legitimately claim will run on the microsecond timeframe... but they're the exception. Generally even a single DB hit across the network, even on the same system, is going to blow right past the web framework's time you're using.
On that note, using this sort of metric, japronto's claimed results smell funny. Even with a 4GHz processor, getting 1,214,440 requests per second on a single core is ~3300 cycles per request. That's less than one cycle per byte in the HTTP request for a reasonable request (with no blocking on any sort of memory request), and that's not counting the TCP itself, any response, or the overhead of switching back and forth between C and Python. I can't see how this is possible without a huge degree of corner cutting; just validating that what you've received is a legal HTTP request, correctly encoded, decoding the fields, etc. is going to eat into that pretty fast, even with all the SSE instructions you may be able to throw at it. (And to emphasize, I'm not saying this is "impossible", just that it requires a lot of corner cutting. I've also got a "web server" out in the wild that handles "web requests" blazingly fast... because it basically ignores the entire web request and shovels out a hard-coded response. Very fast. Not a very good web server.)
[+] [-] vp8989|5 years ago|reply
This seems like a false dichotomy. Avoiding obvious performance mistakes such as the ones you mentioned does not require additional focus that would detract from general building. It just requires that you know what you are doing.
If you are the type of person who makes said mistakes, its unlikely you would ever go back and fix them by "focusing" on performance because the issue is simply that you don't know what you don't know. Likely someone else will come along in future and point out your mistakes to you.
Optimization that actually hinders you from building and requires focus is at the very margins and almost no one is going to those levels in typical "application" code.
[+] [-] closeparen|5 years ago|reply
[+] [-] jeromenerf|5 years ago|reply
It feels like a sensible advice but "optimization", if ever possible, can only get you so far until you need a costly refactoring or rewrite in my experience.
As projects can be very different in context, it is all about what makes a minimal implementation "viable".
[+] [-] devnonymous|5 years ago|reply
Completely agree and this has been my experience ae well. To this I'll also add, inadequate thought put into data modelling. One would have to think lesser about query performance or cost of overfetching if data is modelled around the needs of the system it would serve instead of just modelling real life entries and their relationships as is, straight onto the database.
[+] [-] sriku|5 years ago|reply
[+] [-] TickleSteve|5 years ago|reply
[+] [-] polyrand|5 years ago|reply
* aiosql[0] to write raw SQL queries and having them available as python functions (discussed in [1])
* asyncpg[2] if you are using Postgres
* Map asyncpg/aiosql results to Pydantic[3] models
* FastAPI[4]
Pydantic models become the "source of truth" inside the app, they are designed as a copy of the DB schema, then functions receive and return Pydantic models in most cases.
This stack also makes me think better about my queries and the DB design. I try to make sure each endpoint makes only a couple of queries. Each query may have multiple CTEs, but it's still only a single round-trip. That also makes you think about what to prefetch or not, maybe I want to also get the data to return if the request is OK and avoid another query.
[0] https://github.com/nackjicholson/aiosql [1] https://news.ycombinator.com/item?id=24130712 [2] https://github.com/MagicStack/asyncpg [3] https://pydantic-docs.helpmanual.io/ [4] https://fastapi.tiangolo.com/
[+] [-] pantsforbirds|5 years ago|reply
I was using aio-pika for the rabbit queue and asyncpg and was getting a consistent 25k messages/sec for like 200 lines of code.
[+] [-] fokinsean|5 years ago|reply
Do you have an example project which uses all of these I could look at?
[+] [-] jononor|5 years ago|reply
[+] [-] adsharma|5 years ago|reply
Then you could use dataclasses and map them to the database via sqlalchemy.
https://github.com/adsharma/dataclasses-sql
Couple of other techniques to speedup python:
* Transpile python to another language (py2many) * Compile a large graphql like query to a single query plan in python which can be accelerated. (Fquery)
Both projects on my github.
[+] [-] BiteCode_dev|5 years ago|reply
That would have the benefit of the standardized api of an ORM and the flexibility of SQL, without the coupling.
[+] [-] mixmastamyk|5 years ago|reply
[+] [-] hangtwenty|5 years ago|reply
[+] [-] jordic|5 years ago|reply
[+] [-] ramraj07|5 years ago|reply
If I know an endpoint is going to be hit hard, I forgo trying to use the ORM (except to maybe get the table name from the model obj so some soul can trace it's usage here in the future) and directly do an engine.execute(<raw query>). Makes a huge difference. Next optimization I do is create stored procedures on the database. Only then I start thinking about changing the framework itself.
For folks like me who want to get prototypes off the ground in hours, flask and fastapi are godsend, and if that means I have to worry about serving thousands of requests a second soon thats a happy problem for sure.
[+] [-] vosper|5 years ago|reply
[+] [-] welder|5 years ago|reply
[+] [-] burnthrow|5 years ago|reply
[+] [-] paulmd|5 years ago|reply
That sounds to me like a Python problem and a "this specific ORM isn't performant" problem, not ORMs being bad as a whole. Python has never been the fastest language (it's far slower than, say, Java) and the GIL really prevents applications from scaling well without multiple instances.
And if you really want a "just load the data for me and do nothing else that incurs a performance hit" approach then you can use stateless objects and the ORM truly becomes just a wrapper around the DB to load and transform the data into an object for you and/or do a raw, whole-object update back to the DB.
[+] [-] nwsm|5 years ago|reply
[+] [-] tnash|5 years ago|reply
[+] [-] pantsforbirds|5 years ago|reply
[+] [-] nmfisher|5 years ago|reply
They're clunky monstrosities that act only as guard-rails for inexperienced developers. Far better to invest a few days (which is realistically all you need) to improve their SQL skills and/or code-review practices.
[+] [-] throwdbaaway|5 years ago|reply
The article has a link to https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a..., but failed to mention the key takeaway from the article:
> threaded code got the job done much faster than asyncio in every case
[+] [-] driverdan|5 years ago|reply
[+] [-] Laminary|5 years ago|reply
[+] [-] qeternity|5 years ago|reply
And then Pyston was re-released...and changed everything. It was drop in compatible for us and we saw a 50% drop in latencies.
Source availability aside, I suggest anyone running CPython in prod take a look.
[+] [-] twsted|5 years ago|reply
I remember pyston v1 from Dropbox. You are speaking about v2, which is a binary package (closed-source at the moment)?
[+] [-] jaimex2|5 years ago|reply
You're probably done rapid prototyping by this point anyway.
[+] [-] speedgoose|5 years ago|reply
[+] [-] square_usual|5 years ago|reply
[+] [-] antihero|5 years ago|reply
[+] [-] ximm|5 years ago|reply
[+] [-] cryptos|5 years ago|reply
I don't want to start a language war, but maybe Python is not the first choice for their requirements.
[+] [-] est|5 years ago|reply
https://www.techempower.com/benchmarks/
[+] [-] KaiserPro|5 years ago|reply
Long story short, fastapi was much much faster than anything else for us. It also felt a bit like flask. The integration with pydantic for validating dataclasses on the fly was also great.
[+] [-] dfgdghdf|5 years ago|reply
[+] [-] ancount|5 years ago|reply
I am tempted to refactor the worker to use async features, and that would require factoring out uWSGI, which is fine, I only added it last week. The article states that Vibora is a drop in replacement for flask, but I guess I'm a bit skeptical, as I can't find much information outside of Vibora having a similar api. For a web service with basically one endpoint, I could refactor to another implementation fairly easily, I'm just looking for the right direction.
I thought maybe I should refactor the arch to either batch requests to the worker, or to use async. Anyone have a feeling where I should go? I am just getting started researching this, but any advice would be appreciated.
Edit: at least quart has a migration page.. probably will just try it out, what can I lose? https://pgjones.gitlab.io/quart/how_to_guides/flask_migratio...
Second edit: Also might try out polyrand's stack in the comments.
[+] [-] robertlagrant|5 years ago|reply
[+] [-] maxpert|5 years ago|reply
[+] [-] gchamonlive|5 years ago|reply
How does it compare to Sanic?
[+] [-] hgretg3443|5 years ago|reply
https://www.techempower.com/benchmarks/#section=test&runid=8...
7.000.000 requests per second
Even GO can only achieve 4.500.000 million requests per secnod being a low-level language, in opposite to high-level C#.
[+] [-] oliwarner|5 years ago|reply
Getting an API hit from 300ms to 70ms, and proper frontend caching is far more valuable than concurrency (if you can afford to throw servers at it) because it actually affects user performance.