top | item 40510104

(no title)

alamb | 1 year ago

In general, if you can partition your datasets on your predicate column, sorting is likely the best option

For example when you have a predicate like, `where id = 'fdhah-4311-ddsdd-222aa'` sorting on the `id` column will help

However, if you have predicates on multiple different sets of columns, such as another query on `state = 'MA'`, you can't pick an ideal sort order for all of them.

People often partition (sort) on the low cardinality columns first as that tends to improve compression signficantly

discuss

order

No comments yet.