WingNews

bsder|10 months ago

> Never use UUIDs as your primary keys.

This seems like terrible advice.

For the vast, vast, vast majority of people, if you don't have an obvious primary key, choosing UUIDv7 is going to be an absolute no-brainer choice that causes the least amount of grief.

Which of these is an amateur most likely to hit: crash caused by having too small a primary key and hitting the limit, slowdowns caused by having a primary key that is effectively unsortable (totally random), contention slowdowns caused by having a primary key that needs a lock (incrementing key), or slowdowns caused by having a key that is 16 bytes instead of 8?

Of all those issues, the slowdown from a 16 byte key is by far the least likely to be an issue. If you reach the point where that is an issue in your business, you've moved off of being a startup and you need to cough up real money and do real engineering on your database schemas.

sgarland|10 months ago

The problem is that companies tend to only hire DB expertise when things are dire, and then, the dev teams inevitably are resistant to change.

You can monitor and predict the growth rate of a table; if you don’t know you’re going to hit the limit of an INT well in advance, you have no one to blame but yourself.

Re: auto-incrementing locks, I have never once observed that to be a source of contention. Most DBs are around 98/2% read/write. If you happen to have an extremely INSERT-heavy workload, then by all means, consider alternatives, like interleaved batches or whatever. It does not matter for most places.

I agree that UUIDv7 is miles better than v4, but you’re still storing far more data than is probably necessary. And re: 16 bytes, MySQL annoyingly doesn’t natively have a UUID type, and most people don’t seem to know about casting it to binary and storing it as BINARY(16), so instead you get a 36-byte PK. The worst.

benwilber0|10 months ago

> contention slowdowns caused by having a primary key that needs a lock (incrementing key)

This kind of problem only exists in unsophisticated databases like SQLite. Postgres reserves whole ranges of IDs at once so there is never any contention for the next ID in a serial sequence.

gruez|10 months ago

>Use bigint, never UUID. UUIDs are massive (2x a bigint) and now your DBMS has to copy that enormous value to every side of a relation.

"enormous value" = 128 bits (compared to 64 bits)

In the worst case this causes your m2m table to double, but I doubt this has a significant impact on the overall size of the DB.

TylerE|10 months ago

The concern isn’t the sign of the db on disc but doubling the size of all the indexes in memory

sgarland|10 months ago

When you have a few million rows, no. When you have hundreds of millions or billions of rows, yes, it matters very much.

tpm|10 months ago

I was in the never-UUID camp, but have been converted. Of course depends on how much do you depend on your PKs for speed, but using UUIDs has a a great benefit in that you can create a unique key without a visit to the DB, and that can enormously simplify your app logic.

sgarland|10 months ago

I’ve never understood this argument. In every RDBMS I’m aware of, you can either get the full row you just inserted sent back (RETURNING clause in Postgres, MariaDB, and new-ish versions of SQLite), and even in MySQL, you can access the last auto-incrementing id generated from the cursor used to run the query.

sgt|10 months ago

I also do that for convenience. It helps a lot in many cases. In other cases I might have tables that may grow into the millions of rows (or hundreds of millions), then I'd absolutely not use UUID PK's for those particular tables. And I'd also shard them across schemas or multiple DBs.

outside1234|10 months ago

If you don't have a natural primary key (the usual use case for UUIDs in distributed systems such that you can have a unique value) how do you handle that with bigints? Do you just use a random value and hope for no collisions?

benwilber0|10 months ago

You use a regular bigint/bigserial for internal table relations and a UUID as an application-level identifier and natural key.

hellojesus|10 months ago

Wouldn't you just have an autoincrementing bigint as a surrogate key in your dimension table?

Or you could preload a table of autoincremented bigints and then atomically grab the next value from there where you need a surrogate key like in a distributed system with no natural pk.

rowanseymour|10 months ago

And assuming we're not talking v7 UUIDs.. your indexes are gonna have objects you might commonly fetch together randomly spread everywhere.

LunaSea|10 months ago

But if you use sequential integers as primary key, you are leaking the cardinality of your table to your users / competitors / public, which can be problematic.

varispeed|10 months ago

Is it really enormous? bigint vs UUID is similar to talking about self-hosting vs cloud to stakeholders. Which one has bigger risk of collision? Is the size difference material to the operations? Then go with the less risky one.

rowanseymour|10 months ago

You shouldn't be using BIGINT for random identifiers so collision isn't a concern - this is just to future proof against hitting the 2^31 limit on a regular INT primary key.

hu3|10 months ago

I made a ton of money because of this mistake.

Twice now I was called to fix UUIDs making systems crawl to stop.

People underestimate how important efficient indexes are on relational databases because replacing autoincrement INTs with UUIDs works well enough for small databases, until it doesn't.

My gripe against UUIDs is not even performance. It's debugging.

Much easier to memorize and type user_id = 234111 than user_id = '019686ea-a139-76a5-9074-28de2c8d486d'

zerr|10 months ago

Wasn't UUIDs default go to types for primary keys in the .NET/SQL Server world even 20 years ago?

hu3|10 months ago

I don't think so.

Even Microsoft default sample database schema uses INT ids.

sgarland|10 months ago

I’ll go further; don’t automatically default to a BIGINT. Put some thought into your tables. Is it a table of users, where each has one row? You almost certainly won’t even need an INT, but you definitely won’t need a BIGINT. Is it a table of customer orders? You might need an INT, and you can monitor and even predict the growth rate. Did you hit 1 billion? Great, you have plenty of time for an online conversion to BIGINT, with a tool like gh-ost.

mvdtnz|10 months ago

I work for a company that deals with very large numbers of users. We recently had a major project because our users table ran out of ints and had to be upgraded to bigint. At this scale that's harder than it sounds.

So your advice that you DEFINITELY won't need a BIGINT, well, that decision can come back to bite you if you're successful enough.

(You're probably thinking there's no way we have over 2 billion users and that's true, but it's also a bad assumption that one user row perfectly corresponds to one registered user. Assumptions like that can and do change.)

pyuser583|10 months ago

Gosh this debate again.

I’ll be 110 years old telling my great-grandchildren about how we used integers for primary keys, until reason arrived and we started using uuids.

And they’ll be like, “you weren’t one of those anti-vaxxers were you?”

(no title)

discuss