top | item 40005477

(no title)

A couple other potentially desirable properties you could incorporate:

- K-sortable: ensures good locality when used as an id in a database (e.x. https://github.com/jetify-com/typeid https://github.com/segmentio/ksuid )

- checksum: primarily useful when an id might be conveyed verbally (e.x. customer support) or transcribed (e.x. Bitcoin wallet backup, BIP-39)

discuss

treyd|1 year ago

The bech32 format is a favorite of mine because it uses an alphabet that's designed to be unambiguous and its checksum is designed specifically to guarantee catching few character mistakes and make it possible to suggest where the mistake likely is. It also has a builtin human-readable purpose prefix at the front. Since it's all lowercase it also fits into the QR alphanumeric mode, which doesn't support mixed case so QR codes of bech32 IDs are more efficient.

medstrom|1 year ago

An error-correction code inside the ID?! I love it.

hot_gril|1 year ago

UUIDs should not be used as database primary keys unless the DBMS recommends it or you have a well-studied special reason for it. Postgres and MySQL are meant to use bigserial by default, even Citus. Some special sharded DBMSes like Spanner need non-sequential pkeys, but even Spanner explicitly tells you to use uuid4 because k-sortable keys cause hotspotting: https://cloud.google.com/spanner/docs/schema-design#uuid_pri...

bruce511|1 year ago

I understand the performance implications of using a UUID for a primary key. And if performance is your primary concern, then this is good advice for large tables.

But if I could go back 25 years and only give myself one bit of advice, it would be to use UUIDs as the primary key. Because in a different context to raw performance, it offers a lot of advantages.

While there are advantages in numerous areas, I'll focus on one for this post. The area of distributed data.

We started by running a database on prem. Each branch or store got their own db. 15 years later always-on networking happened. 15 years after that, all businesses have fibre.

So now all the branches use a giant shared online database. With merged data. Uuid based this task would be trivial. Bigint based, yeah, it's not.

Along the same timeline data started escaping from our database. It would go to a phone, travel around a bit, change, get new records, then come home. Think lots of sales folk, in places without reception, doing stuff.

So you're right in the context of a single database (cluster) which encompasses all the data all the time.

But in the context where data lives beyond the database, using uuids solves a lot of problems.

There are other places as well where uuids shine.

So as with most advice when it comes to SQL, I'd add "context matters".

j16sdiz|1 year ago

UUID primary key remove hotspots; Sequence primary key increase locality.

Depends on your access pattern, you may prefer the other way, even on the same DBMS.

stephenr|1 year ago

I can't speak for PG but MySQL at least has a built in function to resolve the time ordering issue when storing v1 UUIDs (and a corresponding function to restore them to a valid UUID).

medstrom|1 year ago

The CUID readme [1] explains that there's no real point to K-sortable on modern hardware:

[1] https://github.com/paralleldrive/cuid2?tab=readme-ov-file#no...

ndriscoll|1 year ago

The CUID readme is wrong. You can safely ignore anyone who says "cloud-native" while discussing performance unless they're explaining why "cloud-native" architectures are often the worst of all possible designs for performance.

In postgres for example, full_page_writes (default on, generally not safe to turn off unless you can be sure your filesystem can guarantee it) means you have to write the entire page to WAL if you write one record. This will make your WAL grow way faster if you're doing random IOs. So right off that bat that's going to be a huge write impact.

infogulch|1 year ago

What are the tradeoffs between typeid and ksuid?