I think the first step to making improvements in this area is to have the planner err on the side of caution more often. Today it's quite happy to join using a Nested Loop when it thinks the outer side of the join contains a single row. We have various means in the planner on how certain we might be that the 1 row thing will hold true during execution. An equality condition on a column with a unique index, is, for example, a way we could be certain of getting <= 1 row. If for example, the selectivity estimate concludes 1 row will match for some WHERE clause containing several columns with independent statistics, then the certainty level goes down. It seems silly not to swap the join order and switch to a hash join for this. Best case, we have to build a hash table to store 1 row. Probing that won't be very expensive and could even be optimised further to skip hashing if we continually probe the same bucket for N probes. The cost of additional rows over the estimated 1 row scales much more linearly than the quadratic scaling we'd have gotten with Nested Loop. I imagine a setting which controls how much risk the planner is willing to take would allow users to maintain the status quo of the current costing model. I'd imagine not many would want that, however.
No comments yet.