top | item 39322704

(no title)

davidrowley | 2 years ago

> I mean, what we have right now (multiply selectivities together as if they were independent) is also pretty dumb

Yeah, I think it was probably a mistake to always assume there's zero correlation between columns, but what value is better to use as a default? At least extended statistics allows the correlations of multiple columns to be gathered now. That probably means we'd be less likely to reconsider changing the default assumption of zero correlation when multiplying selectivities.

discuss

order

Sesse__|2 years ago

Yeah, I don't have all the answers, my point is that it would be worthwhile to try similarly dumb stuff and see if it works just as well and could be more robust in some places :-)

Multi-selectivities are good, but IIRC they can't be specified across tables and thus across joins, right? Out of curiosity; how do you reconcile multiple selectivities? If you have a multi-column histogram on (a,b) and one on (b,c) and one on (c), and you need to figure out the selectivity of WHERE a=? AND b=? AND c=? from those three? I looked into this at some point, and academia presented me with a nightmare of second-order cone programming and stuff. :-) (I never implemented any of it before leaving the database world.)

davidrowley|2 years ago

> Multi-selectivities are good, but IIRC they can't be specified across tables and thus across joins, right?

Yeah, no extended statistics for join quals yet.

> how do you reconcile multiple selectivities?

Looking at https://doxygen.postgresql.org/extended__stats_8c.html#a3f10... it seems the aim is to find the stats that cover the largest number of clauses tiebreaking on the statistics with the least number of keys. For your example both of those are the same, so it seems that which stats are applied is down to the order the stats appear in the stats list. That list is ordered by OID, which does not seem ideal as a dump and restore could result in the stats getting a different OID. Seems sorting that list by statistics name might be better. That's what we do for triggers, which seems like a good idea as it gives the user some ability to control the trigger fire order.