top | item 36284801

(no title)

erlkonig | 2 years ago

[I wrote this mostly imagining the idea was about converting the entire Postgresql service to a single monolithic process. I'm not a fan so far. If is actually around coalescing like processes down to a single multithreaded process, that's more reasonable but still comes at a future cost - and whether pointful is still a question]

Converting code into multithreaded code tends to make it harder to test and debug FOREVER, as well as being more limited by default for certain system resources than a multi-process solution. Viewing and managing threads from the outside is harder, and killing a rogue thread is much more likely to crash a MT solution than killing a process in a typical resilient MP solution. Above all else, I need a database to be utterly reliable (or as close to it as possible) - including being able to back off in a mature fashion in cases of memory exhaustion (I have overcommit disabled to restore classical memory handling, i.e. malloc() can fail), and file system exhaustion. MT throws a wrench through most of the workings of a complex program, and unless some specific gain can be identified that compensates for adding complexity and fragility to virtually any change going forward, then... um... why? I read a bit "well, the other guys are doing it" handwaving:

    "Other large projects have gone through this transition.    
    It's not easy, but it's a lot easier now than it was
    10 years ago. The platform and compiler support is
    there now, all libraries have thread-safe 
    interfaces, etc."
But that isn't a functional gain. And:

    "I don't expect you or others to buy into any
    particular   code change at this point, or to
    contribute time into it. Just to accept that it's a
    worthwhile goal. If the implementation turns out to
    be a disaster, then it won't be accepted, of course.
    But I'm optimistic."
But this is NOT a worthwhile goal. Fun, perhaps. Diverting or challenging, perhaps. A disaster, quite possibly. But without identifying a goal that can only be achieved by walking into the multithreading pit, the project is a waste of time for end users. Possibly a growth experience for the experimenters, regardless of whether successful.

discuss

order

perrygeo|2 years ago

There are significant costs to this but maybe some real benefits too. This post certainly doesn't sell the potential benefits well enough.

The big functional gain would be better connection handling. The current process-per-connection model has overhead and it's pretty common to see large database instances with double-digit max connection limits. Because connections are expensive and in (artificially) limited supply, application developers work around the limitations with connection pooling and/or proxy services.

Theoretically, a multi-threaded postgres could easily deal with thousands of concurrent connections - not just a performance improvement but a game changer in terms of application developer UX. When connections are cheap, the application just connects when it needs to communicate, no pgbouncer or connection pools needed.

I have no idea if the multi-threading proposal here is viable, but if it can make connections easier to manage it might be worth it.

rektide|2 years ago

This feels a bit like you are using an image of absolute safety to hold hostages, not allowing the potential for change & improvement.

The author starts by citing a decent variety of sources to have already expressed interest here, who see this as progress.

Migrating a bunch of per-process global variables to have scope (per thread or per session) may be risky, but gee, it just sounds like vaguely reasonable architecture to have these days to me.

erlkonig|2 years ago

You can usually find developers interested in any fashionable approach to a problem. Change is fine. What improvement, specifically, though? Adding multithreading is not a functional improvement in and of itself, but more the opposite. MT should be used when a specific, important functional gain can be realized through no other approach.

I'm not trying to win an argument or anything here, I'm just highlighting from my and others' experiences that multithreading is a tradeoff not to be made casually. It makes some things faster, especially if not I/O bound, but it also increases dev and debug cost, and reduces the number of developers who can assist. That downside tends to permanent.