top | item 39314541

(no title)

davidrowley | 2 years ago

That could be useful if there was a way to just disable non-parameterized nested loop, however enable_nestloop=0 also disables parameterized nested loops. Parameterized nested loops are useful to avoid sorting or hashing some large relation when only a small subset of that relation is likely to have a join partner. This is even more true when you consider that since PG14, Memoize exists to act as a cache between Nested Loop and its inner subnode to cache previously looked-up values.

It's also important to consider that with enable_nestloop=0, when Nested Loop must be used (e.g for a CROSS JOIN) that the cost penalty that's added to reduce the chances of Nested Loop being used can dilute the costs so much that the query planner can then go on to make poor subsequent choices later in planning due to the costs for each method of implementing the subsequent operation being so relatively close to each other than the slightly cheaper one might not even be considered. See add_path() and STD_FUZZ_FACTOR. So, running enable_nestloop=0 in production is not without risk.

discuss

order

Sesse__|2 years ago

Yeah, nestloop with a cheap inner path (e.g. a lookup into a unique index) should be just fine, so I don't think nestloops as a whole should be banned. (Also, I believe Postgres is pretty much the only place I've seen the concept of a parameterized path described; it's not talked much about in academia, although it is probably really hard to make an index-aware System R planner without it.)

I wondered whether it would be possible just to add a fixed fuzz to every row estimate, say five rows. It would essentially mean you can never get this issue of a small undercount causing a plan disaster. Overestimating slightly is basically never a big issue as far as I know.

(I should perhaps have considered this when I was actually making a query planner in a previous life, but there were more than enough other things to worry about :-) )