Yes, a lot of this seems interesting to go into. I really hope that at some point, someone would find the resources to just try a lot of dumb stuff and see what works in practice. I mean, what we have right now (multiply selectivities together as if they were independent) is also pretty dumb, and there's no good reason why it should be preferred over everything else.
Another avenue is of course trying to avoid the issue to begin with, e.g. through the recent “translation grids” of Müller and Moerkotte for better join selectivities. But I doubt anyone is going to be finding a silver bullet for this anytime soon, so reducing plan risk somehow seems very worthwhile.
> I mean, what we have right now (multiply selectivities together as if they were independent) is also pretty dumb
Yeah, I think it was probably a mistake to always assume there's zero correlation between columns, but what value is better to use as a default? At least extended statistics allows the correlations of multiple columns to be gathered now. That probably means we'd be less likely to reconsider changing the default assumption of zero correlation when multiplying selectivities.
Sesse__|2 years ago
Another avenue is of course trying to avoid the issue to begin with, e.g. through the recent “translation grids” of Müller and Moerkotte for better join selectivities. But I doubt anyone is going to be finding a silver bullet for this anytime soon, so reducing plan risk somehow seems very worthwhile.
davidrowley|2 years ago
Yeah, I think it was probably a mistake to always assume there's zero correlation between columns, but what value is better to use as a default? At least extended statistics allows the correlations of multiple columns to be gathered now. That probably means we'd be less likely to reconsider changing the default assumption of zero correlation when multiplying selectivities.