(no title)
erlkonig | 2 years ago
Converting code into multithreaded code tends to make it harder to test and debug FOREVER, as well as being more limited by default for certain system resources than a multi-process solution. Viewing and managing threads from the outside is harder, and killing a rogue thread is much more likely to crash a MT solution than killing a process in a typical resilient MP solution. Above all else, I need a database to be utterly reliable (or as close to it as possible) - including being able to back off in a mature fashion in cases of memory exhaustion (I have overcommit disabled to restore classical memory handling, i.e. malloc() can fail), and file system exhaustion. MT throws a wrench through most of the workings of a complex program, and unless some specific gain can be identified that compensates for adding complexity and fragility to virtually any change going forward, then... um... why? I read a bit "well, the other guys are doing it" handwaving:
"Other large projects have gone through this transition.
It's not easy, but it's a lot easier now than it was
10 years ago. The platform and compiler support is
there now, all libraries have thread-safe
interfaces, etc."
But that isn't a functional gain. And: "I don't expect you or others to buy into any
particular code change at this point, or to
contribute time into it. Just to accept that it's a
worthwhile goal. If the implementation turns out to
be a disaster, then it won't be accepted, of course.
But I'm optimistic."
But this is NOT a worthwhile goal. Fun, perhaps. Diverting or challenging, perhaps. A disaster, quite possibly. But without identifying a goal that can only be achieved by walking into the multithreading pit, the project is a waste of time for end users. Possibly a growth experience for the experimenters, regardless of whether successful.
perrygeo|2 years ago
The big functional gain would be better connection handling. The current process-per-connection model has overhead and it's pretty common to see large database instances with double-digit max connection limits. Because connections are expensive and in (artificially) limited supply, application developers work around the limitations with connection pooling and/or proxy services.
Theoretically, a multi-threaded postgres could easily deal with thousands of concurrent connections - not just a performance improvement but a game changer in terms of application developer UX. When connections are cheap, the application just connects when it needs to communicate, no pgbouncer or connection pools needed.
I have no idea if the multi-threading proposal here is viable, but if it can make connections easier to manage it might be worth it.
ttfkam|2 years ago
https://www.citusdata.com/blog/2020/10/25/improving-postgres...
rektide|2 years ago
The author starts by citing a decent variety of sources to have already expressed interest here, who see this as progress.
Migrating a bunch of per-process global variables to have scope (per thread or per session) may be risky, but gee, it just sounds like vaguely reasonable architecture to have these days to me.
erlkonig|2 years ago
I'm not trying to win an argument or anything here, I'm just highlighting from my and others' experiences that multithreading is a tradeoff not to be made casually. It makes some things faster, especially if not I/O bound, but it also increases dev and debug cost, and reduces the number of developers who can assist. That downside tends to permanent.
unknown|2 years ago
[deleted]