My findings were a bit controversial at the time as I found that uWSGI + sync Python is about the best option, largely because replacing interpreted Python code with native code is a very significant factor in the performance of a Python program.
In the following discussion (and private emails) I was stunned by the number of async enthusiasts who proposed raising the number of database connections in the connection pool by two or three orders of magnitude (I had 20 conns in a pool for a 4 cpu machine) for reasons along the lines of "the async app is starved of connections".
In my opinion that suggestion betrays a misunderstanding of what is likely going on in the database when you have 100s or 1000s of connections all contending for the same data: in SQL even reads create locks. Async applications yield a lot, by design, and under hundreds or thousands of requests per second there is a considerable buildup of "work in progress" in the form of async tasks that were yielded from and which have not yet been returned to and completed. Many hundreds of database connections is going to create an enormous volume of bookkeeping on the database side and is very likely to slow things down to a absolute crawl.
Even idle connections are known to have detrimental effect in postgres. Someone at Microsoft is apparently working on this and released this great blog post quantifying the problem:
When I was working at a large online travel website a few decades ago, we had a few hundred front-end boxes each with a connection pool set to (I think) around 10 each - so around 2000 connections on the DB backend. We did some profiling and discovered that, under the heaviest load, we never had more than one connection active per web server (and usually 0), so a single shared DB connection would have been just as effective as the pool was.
I think latency also matters a lot for client-side DB connection pooling. If the DB is a 10ms round-trip away, and queries take 10ms, the process on the DB side is only really busy half the time.
Doesn't all of this become mindbogglingly complex once you factor in a) changing replica counts (connection count changes in steps of up to the max pool size) and b) multiple applications using the same DB hardware?
This article helped me to reduce connections pool from 80 to 10 per box. Which helped to serve traffic spikes from couple requests per second to thousands per second.
One thing I never understand about connection pool is how do they handle server-sent close connection (FIN) events? Like I can guarantee there are tons of CLOSE_WAIT connections on my near-idle Python process for various of db pools, mysql or redis or pg.
For sqlalchemy it just complaints "MySQL server has gone away (error(32, 'Broken pipe'))" and never know how to recover. For redis it just 100% the CPU and keep epoll the dead TCP.
I'm not familiar with specific implementations, but from what I've seen with Redis and Postgres clients there's always a reconnect hook of some sort, which I assume pooling proxies give you as well. We've had plenty of broken pipe errors and we just reconnect.
Generally a connection pool also checks the health of its connections. In SQL connection pools this can be done by executing a select 1, if this fails then a new connection can be attempted.
Just 6 days ago you published what was a pretty great writeup on message queues, now this thorough dive on connections & pools; what's your writing schedule like? At some 12k words of fairly accurate and precise technical content I'm kind of blown away by your pace.
Am I reading this correctly, that with NodeJS and PostgreSQL, the usage of multiple connection poolers _combined_ is a not-so-good idea?
Example, using Knex as the connection pooler for your node app, which is connected to your PgBouncer deployment?
I have not considered this a fault, in one way I find it compelling to have a layer of "protection" by using Knex in addition to PgBouncer, but not I'm not that sure anymore.
We have a Rails app that uses Puma with MySQL, but so far single threaded (the Puma config has `threads 1,1`), because we migrated from Unicorn and are still worrying about thread-safety.
Currently our database.yml pool size is set to 25, because we have 25 puma workers on the server.
Reading your article, this seems to be overkill, since that setting is _per worker_, and since each worker is running only with 1 thread, the default 5 should be more than enough?
Also, I just checked and our MySQL is configured with 151 max connections; if every worker of ours can use up to 25 connections, that can lead up to 625 connections to the database, and that's without considering Sidekiq (concurrency 25).
> the popular servers for Python, PHP and other languages that use a global lock or assume single-threaded single-process execution
This is not true for python. The article you linked mentions Gunicorn, uWSGI and CherryPy, they are threaded and can handle multiple requests per process.
Even if Python does have this GIL thing, it does not lock network operations for a tcp/udp based database protocol (some C based modules do tend to lock the GIL)
I'm confused by this statement about pgBouncer et al.
These systems allow you to make as many database connections as you want and manage them inefficiently, because the connections they give you are fake & cheap simulated connections that they handle with low overhead.
What makes them inefficient then, if they're cheap and have low management overhead?
Have edited to "These systems allow you to make as many database connections as you want without worrying about management, because the connections..."
Not that many people may be using Oracle and PHP together, but last summer I decided to take a closer look at some performance aspects since I'm wanting to move certain PHP things to the cloud, while our Oracle database remains local, which would introduce some extra latency.
In some basic testing on my local machine at work just connecting to our local Oracle database introduced a good 60 ms - 100 ms for a request (before running any sort of queries) and due to PHP's nature this recurs on every new request.
I had searched for options for a long time in past months/years that would allow PHP to have some sort of connection pool the same way languages like Java/Go might allow, but had come up short each time.
I don't even recall how I came across it, but last July I finally learned about Oracle's Database Resident Connection Pooling feature (and then reminded my coworker about wanting to try it out in November) and was able to run some tests and it does a good job of eliminating most of that unnecessary connection waiting time for PHP applications so I was pretty happy with that testing and it didn't seem to increase the load much at all on the database server side.
Even though it might not be as ideal as a truly separate Connection Pool being managed at the application server level, or outside of the database, a Connection Pool available for use at the database level still has benefits so it would be kind of neat to maybe see more of this sort of approach be available with other vendors (if it isn't already).
EDIT: Forgot to add that as a result of the DRCP testing, even though moving the PHP side to the cloud introduces extra latency, with the time savings of the DRCP change, it helps reduce latency enough that overall it results in a time savings over the normal/conventional connection approach when the PHP code is local so for me that was a win (hopefully I'll get to try using it in production in the near future, but waiting on my colleague to sort out a few things with enabling it in production at the moment).
Great article, Suhir. Very well written and informative :DD
I would like to help point out some typos and missing words, and I made this PR[1] so you can easily see the diff. I hope this is useful, but also feel free to ignore these suggestions
[+] [-] wiradikusuma|5 years ago|reply
[+] [-] calpaterson|5 years ago|reply
My findings were a bit controversial at the time as I found that uWSGI + sync Python is about the best option, largely because replacing interpreted Python code with native code is a very significant factor in the performance of a Python program.
In the following discussion (and private emails) I was stunned by the number of async enthusiasts who proposed raising the number of database connections in the connection pool by two or three orders of magnitude (I had 20 conns in a pool for a 4 cpu machine) for reasons along the lines of "the async app is starved of connections".
In my opinion that suggestion betrays a misunderstanding of what is likely going on in the database when you have 100s or 1000s of connections all contending for the same data: in SQL even reads create locks. Async applications yield a lot, by design, and under hundreds or thousands of requests per second there is a considerable buildup of "work in progress" in the form of async tasks that were yielded from and which have not yet been returned to and completed. Many hundreds of database connections is going to create an enormous volume of bookkeeping on the database side and is very likely to slow things down to a absolute crawl.
Even idle connections are known to have detrimental effect in postgres. Someone at Microsoft is apparently working on this and released this great blog post quantifying the problem:
https://techcommunity.microsoft.com/t5/azure-database-for-po...
[+] [-] commandlinefan|5 years ago|reply
[+] [-] ramchip|5 years ago|reply
[+] [-] Traubenfuchs|5 years ago|reply
[+] [-] batter|5 years ago|reply
[+] [-] sudhirj|5 years ago|reply
[+] [-] est|5 years ago|reply
For sqlalchemy it just complaints "MySQL server has gone away (error(32, 'Broken pipe'))" and never know how to recover. For redis it just 100% the CPU and keep epoll the dead TCP.
Example: https://github.com/andymccurdy/redis-py/pull/886
[+] [-] sudhirj|5 years ago|reply
[+] [-] outworlder|5 years ago|reply
Generally your client should close the connection when it's done. Server-side close should mostly be related to idle timeouts.
[+] [-] hilbertseries|5 years ago|reply
[+] [-] sudhirj|5 years ago|reply
[+] [-] artimaeis|5 years ago|reply
[+] [-] toredash|5 years ago|reply
Example, using Knex as the connection pooler for your node app, which is connected to your PgBouncer deployment?
I have not considered this a fault, in one way I find it compelling to have a layer of "protection" by using Knex in addition to PgBouncer, but not I'm not that sure anymore.
[+] [-] pqdbr|5 years ago|reply
We have a Rails app that uses Puma with MySQL, but so far single threaded (the Puma config has `threads 1,1`), because we migrated from Unicorn and are still worrying about thread-safety.
Currently our database.yml pool size is set to 25, because we have 25 puma workers on the server.
Reading your article, this seems to be overkill, since that setting is _per worker_, and since each worker is running only with 1 thread, the default 5 should be more than enough?
Also, I just checked and our MySQL is configured with 151 max connections; if every worker of ours can use up to 25 connections, that can lead up to 625 connections to the database, and that's without considering Sidekiq (concurrency 25).
[+] [-] est|5 years ago|reply
This is not true for python. The article you linked mentions Gunicorn, uWSGI and CherryPy, they are threaded and can handle multiple requests per process.
Even if Python does have this GIL thing, it does not lock network operations for a tcp/udp based database protocol (some C based modules do tend to lock the GIL)
[+] [-] mekoka|5 years ago|reply
These systems allow you to make as many database connections as you want and manage them inefficiently, because the connections they give you are fake & cheap simulated connections that they handle with low overhead.
What makes them inefficient then, if they're cheap and have low management overhead?
[+] [-] victor106|5 years ago|reply
[+] [-] tjvc|5 years ago|reply
> Rails auto-management needs to be conservative and cautious, therefore inefficient
Can you help me understand this better? Are you saying that Rails tends to hold onto connections for longer than it needs to?
[+] [-] sankyo|5 years ago|reply
[+] [-] fotta|5 years ago|reply
[+] [-] cxcorp|5 years ago|reply
On a sidenote, I wonder if this part has a typo?
> These systems allow you to make as many database connections as you want and manage them inefficiently
Perhaps inefficiently should be inexpensively or efficiently?
[+] [-] sudhirj|5 years ago|reply
[+] [-] sathley|5 years ago|reply
[+] [-] orware|5 years ago|reply
In some basic testing on my local machine at work just connecting to our local Oracle database introduced a good 60 ms - 100 ms for a request (before running any sort of queries) and due to PHP's nature this recurs on every new request.
I had searched for options for a long time in past months/years that would allow PHP to have some sort of connection pool the same way languages like Java/Go might allow, but had come up short each time.
I don't even recall how I came across it, but last July I finally learned about Oracle's Database Resident Connection Pooling feature (and then reminded my coworker about wanting to try it out in November) and was able to run some tests and it does a good job of eliminating most of that unnecessary connection waiting time for PHP applications so I was pretty happy with that testing and it didn't seem to increase the load much at all on the database server side.
Even though it might not be as ideal as a truly separate Connection Pool being managed at the application server level, or outside of the database, a Connection Pool available for use at the database level still has benefits so it would be kind of neat to maybe see more of this sort of approach be available with other vendors (if it isn't already).
EDIT: Forgot to add that as a result of the DRCP testing, even though moving the PHP side to the cloud introduces extra latency, with the time savings of the DRCP change, it helps reduce latency enough that overall it results in a time savings over the normal/conventional connection approach when the PHP code is local so for me that was a win (hopefully I'll get to try using it in production in the near future, but waiting on my colleague to sort out a few things with enabling it in production at the moment).
[+] [-] boberoni|5 years ago|reply
I would like to help point out some typos and missing words, and I made this PR[1] so you can easily see the diff. I hope this is useful, but also feel free to ignore these suggestions
[1] https://github.com/chenbobby/editing/pull/1/files
[+] [-] x4m|5 years ago|reply
[+] [-] sudhirj|5 years ago|reply
[+] [-] kohlerm|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]