(no title)
johnthuss | 1 year ago
But I would disagree with the takeaway to focus on optimizing your indexes and not your tables. The reason is that the order of columns in a multi-column index is highly meaningful and intentional in order to support match on a range of values for the last column in the index. The way databases work you can only utilize a multi-column index on (customer_id int4, date timestamp) if have an equality match on customer_id, like "WHERE customer_id = 1 AND BETWEEN '2024-01-01' and '2025-01-01'". If you reorder these columns in the index to put the larger date column first, then, sure, you save space in the index, but you also make it worthless – it will never be used by the query above. As such, optimizing a multi-column index is only useful when all the columns are queried for equality rather than a range.
In contrast, when you are creating a NEW table you might not think hard about the order of the columns in the table, and especially not about the data-sizes of each column and their alignment. But doing so at the time you create the table can be tremendously beneficial if it is going to be very large. It is important to note that you not only save space on-disk, but in precious RAM when the tuples are loaded.
rand_r|1 year ago
It's not as fast as a multi-column index, but the savings of not having to worry about all the combinations of columns that can be queried together could well be worth it.
- https://www.postgresql.org/docs/release/8.1.0/
- https://www.postgresql.org/docs/current/indexes-bitmap-scans...
williamdclt|1 year ago
On small/medium tables and lowish throughout though, yeah it’s often good enough and avoids having many indexes for specific use cases (which is a cost in itself, in memory/cpu/storage)
Pamar|1 year ago
1) https://en.wikipedia.org/wiki/FoxPro
It is difficult to find a complete explanation for Rushmore nowadays, from what I remember, it would create a bitmap where each bit represented the nth record of the table you wanted to search, then with a single, fast sequential scan it would set the nth bit to 1 if the record satisfied all clauses of your search, 0 otherwise.
Try to see if this makes any sense to you: http://www.foxpert.com/docs/howfoxproworks.en.htm
qazxcvbnm|1 year ago
jeltz|1 year ago
https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZd...
johnthuss|1 year ago
It's great this is improving, but this is a fairly narrow improvement. Personally, the multi-column indexes I use would not be improved by this change since column 'a' does not store a "reasonably small" distribution of values.
sgarland|1 year ago
While the query will take much longer (for me, it was about 47 msec vs 0.6 msec for 2,000,000 rows), it still uses the index.
Similarly, while normally you wouldn’t expect a query using predicate Y to use an index defined on columns (X, Y, Z) if predicate X is also not referenced, Postgres may choose to do so, depending on table stats, and the relative difference between random_page_cost and seq_page_cost. I’ve seen it happen before.
ghusbands|1 year ago
> While the query will take much longer (for me, it was about 47 msec vs 0.6 msec for 2,000,000 rows), it still uses the index.
I'd argue that something being 78x slower can make it pretty useless, but it is indeed at least used, in some cases.