top | item 45623150

(no title)

rvitorper | 4 months ago

Does anyone have performance issues with uuidv4? I worked with a db with 10s of billions of rows, no issues whatsoever. Would love to hear the mileage of fellow engineers

discuss

zerd|4 months ago

I've had issues in a database with billions of rows where the PKs were a UUID. Indices on PK, and also foreign keys from other tables pointing to that table were pretty big, so much so that the indices themselves didn't all fit in memory. Like we would have an index on customer_id, document_id, both UUIDv4. DB didn't have UUID support, so they were stored as strings, so just 1 billion rows took ~30 GiB memory for PK index, 60GiB for the composite indices etc. So eventually the indices would not fit in memory. If we had UUID support or stored as bytes it might have halved it, but eventually become too big.

If you needed to look up say the 100 most recent documents, that would require ~100+ disk seeks at random locations just to look up the index due to the random nature of UUIDv4. If they were sequential or even just semi-sequential that would reduce the number of lookups to just a few, and they would be more likely to be cached since most hits would be to more recent rows. Having it roughly ordered by time would also help with e.g. partitioning. With no partitioning, as the table grows, it'll still have to traverse the B-Tree that has lots of entries from 5 years ago. With partitioning by year or year-month it only has to look at a small subset of that, which could fit easily in memory.

cipehr|4 months ago

What database were you using? For example with SQL server, by default it clusters data on disk by primary key. Random (non-sequential) PKs like uuidv4 require random cluster shuffling to insert a row “in the middle” of a cluster, increasing io load and causing performance issues.

Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.

rvitorper|4 months ago

Postgres. It was also a single instance, which made it significantly easier. But nice to know that this is an issue on SQL Server

ahoka|4 months ago

Then cluster it differently? The whole problem uuidv7 in databases solves is a non-issue in most cases.

crazygringo|4 months ago

Honestly not really. Yes random keys make inserts slower. But if inserts are only 1% of your database load, then yeah it's basically no issues whatsoever.

On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.

rvitorper|4 months ago

Makes sense. Thanks for the comment

unknown|4 months ago

[deleted]