top | item 39314797

(no title)

davidrowley | 2 years ago

You have to remember that because the query has an ORDER BY, it does not mean the rows come out in a deterministic order. There'd need to be at least an ORDER BY column that provably contains unique values. Of course, you could check for that, but then I don't think that's the end of the complexity. Things like SKIP LOCKED skip over rows which we can't immediately lock. If the first time we couldn't lock the lowest order row and output the 2nd, then aborted, replanned, then next time the 1st row wasn't locked, we'd then output the rows in the wrong order. It's probably possible to figure out all these cases and not do it when there's some hazard, but it sounds very tricky and bug-prone to me.

discuss

order

rst|2 years ago

However, at least some ORDER BY queries will have a final sorting step before the client sees the first row. (If you're sorting on a column with an index, it may be able to produce the data in sorted order; if not, though, the final sort seems unavoidable.)

Which means that the capacity to buffer up data until it's all produced is present. But it might still be awkward to make it the default.

davidrowley|2 years ago

I think the first step to making improvements in this area is to have the planner err on the side of caution more often. Today it's quite happy to join using a Nested Loop when it thinks the outer side of the join contains a single row. We have various means in the planner on how certain we might be that the 1 row thing will hold true during execution. An equality condition on a column with a unique index, is, for example, a way we could be certain of getting <= 1 row. If for example, the selectivity estimate concludes 1 row will match for some WHERE clause containing several columns with independent statistics, then the certainty level goes down. It seems silly not to swap the join order and switch to a hash join for this. Best case, we have to build a hash table to store 1 row. Probing that won't be very expensive and could even be optimised further to skip hashing if we continually probe the same bucket for N probes. The cost of additional rows over the estimated 1 row scales much more linearly than the quadratic scaling we'd have gotten with Nested Loop. I imagine a setting which controls how much risk the planner is willing to take would allow users to maintain the status quo of the current costing model. I'd imagine not many would want that, however.