top | item 41770089

(no title)

This is a super useful thing to know and I'm thankful for this article highlighting this aspect of Postgres.

But I would disagree with the takeaway to focus on optimizing your indexes and not your tables. The reason is that the order of columns in a multi-column index is highly meaningful and intentional in order to support match on a range of values for the last column in the index. The way databases work you can only utilize a multi-column index on (customer_id int4, date timestamp) if have an equality match on customer_id, like "WHERE customer_id = 1 AND BETWEEN '2024-01-01' and '2025-01-01'". If you reorder these columns in the index to put the larger date column first, then, sure, you save space in the index, but you also make it worthless – it will never be used by the query above. As such, optimizing a multi-column index is only useful when all the columns are queried for equality rather than a range.

In contrast, when you are creating a NEW table you might not think hard about the order of the columns in the table, and especially not about the data-sizes of each column and their alignment. But doing so at the time you create the table can be tremendously beneficial if it is going to be very large. It is important to note that you not only save space on-disk, but in precious RAM when the tuples are loaded.

discuss

rand_r|1 year ago

This is sort of an aside, but a very interesting thing about Postgres is that it can efficiently combine independent column indexes together, so there is much less of a need, compared to older databases, to even create multi-column indexes. It's a feature from 8.1 called "Bitmap Scan". Basically, if you create an index on column X and an index on column Y, it can use them to do queries involving either or both columns pretty efficiently (for any number of columns).

It's not as fast as a multi-column index, but the savings of not having to worry about all the combinations of columns that can be queried together could well be worth it.

- https://www.postgresql.org/docs/release/8.1.0/

- https://www.postgresql.org/docs/current/indexes-bitmap-scans...

williamdclt|1 year ago

It’s very cool, but at high throughput you really see the difference. This bitmap scan can take huge amounts of cpu, reduced to nothing (and much faster) when setting up a proper multi column index.

On small/medium tables and lowish throughout though, yeah it’s often good enough and avoids having many indexes for specific use cases (which is a cost in itself, in memory/cpu/storage)

Pamar|1 year ago

Bitmap Scan sounds a lot like Rushmore technology in Foxpro[1]. Are they the same?

1) https://en.wikipedia.org/wiki/FoxPro

It is difficult to find a complete explanation for Rushmore nowadays, from what I remember, it would create a bitmap where each bit represented the nth record of the table you wanted to search, then with a single, fast sequential scan it would set the nth bit to 1 if the record satisfied all clauses of your search, 0 otherwise.

Try to see if this makes any sense to you: http://www.foxpert.com/docs/howfoxproworks.en.htm

qazxcvbnm|1 year ago

Is there any potential that indexes could be created over foreign key joins in the future? I know that as of today, no multi-table indices or statistics exist for Postgres, which has had led me to do some further denormalisations.

jeltz|1 year ago

This is a limitation which is currently being worked on. The order will still matter of course but it will allow PostgreSQL to make some use of indexes even when the order of coulums does not match.

https://www.postgresql.org/message-id/CAH2-Wzmn1YsLzOGgjAQZd...

johnthuss|1 year ago

>>> This is feasible in cases where the total number of distinct values in the column 'a' is reasonably small (think tens or hundreds, perhaps even thousands for very large composite indexes).

It's great this is improving, but this is a fairly narrow improvement. Personally, the multi-column indexes I use would not be improved by this change since column 'a' does not store a "reasonably small" distribution of values.

sgarland|1 year ago

This (index ordering resulting in useless indices) is not true, at least not in newer versions of Postgres (I just tried with 15).

While the query will take much longer (for me, it was about 47 msec vs 0.6 msec for 2,000,000 rows), it still uses the index.

Similarly, while normally you wouldn’t expect a query using predicate Y to use an index defined on columns (X, Y, Z) if predicate X is also not referenced, Postgres may choose to do so, depending on table stats, and the relative difference between random_page_cost and seq_page_cost. I’ve seen it happen before.

ghusbands|1 year ago

> This (index ordering resulting in useless indices) is not true, at least not in newer versions of Postgres (I just tried with 15).

> While the query will take much longer (for me, it was about 47 msec vs 0.6 msec for 2,000,000 rows), it still uses the index.

I'd argue that something being 78x slower can make it pretty useless, but it is indeed at least used, in some cases.