top | item 39712389

(no title)

The problem is not the query planner per se. There is a much more subtle problem and it is related to how you have created the query in the join structure.

For many queries, the order in which you specify the joins doesn't really matter. But there are a number of classes where the join order dramatically affects how fast the query can actually run and nothing the query planner does will change this.

I came across this problem around 30 years ago. By accident, I discovered what the problem cause was - the order of the joins. The original query was built and took 30 - 40 minutes to run. I deleted particular joins to see what intermediate results were. In reestablishing the joins, the query time went to down to a couple of seconds.

I was able to establish that the order of joins in this particular case was generating a Cartesian product of the original base records. By judicious reordering of the joins, this Cartesian product was avoided.

If you are aware of this kind of problem, you can solve it faster than any query planner ever could.

discuss

sgarland|1 year ago

Usually that kind of problem is a result of exceeding {from,join}_collapse_limit, which defaults to 8. If you have more tables than that in a query, Postgres doesn’t exhaustively try all ordering to determine the best, and instead uses its genetic algorithm, which can be worse.

You can raise the limit at the risk of causing query planning times going up exponentially, or refactor your schema, or, as you did, rewrite it to be more restrictive out of the gate. That way, those join paths will be found first and so will be the best found when the planner gives up.

tacone|1 year ago

Would love to see a practical example of this.

Another thing to consider is table fragmentation. Fragmentation > bad row count estimation > bad query plan.

afandian|1 year ago

We hit some weird query plans. I don't have forensic evidence but here's an example: https://crossref.gitlab.io/engineering/decision-records/dr-0...

Combination of two joins, filtering on all tables, and sorting by the LEFT one. Performance was fine, until we hit the scale when it suddenly became unpredictable.

In hindsight, given the variability of the queries and table structure, I don't think any query planner could have done a good job. The natural answer was to denormalize. But the journey to get there was a little unpredictable.