top | item 33700274

(no title)

dvnguyen | 3 years ago

Just curious to learn: when do you need sortable primary keys?

discuss

order

winrid|3 years ago

It makes pagination much more efficient and possible to completely use an index (re: data is pre-sorted) if you have some sort of increasing id, vs sorting in-memory on request. Even having some sort of indexed "createdAt" field, and sorting on that, isn't super reliable for pagination when the dataset is changing. You usually want a stable sort to make pagination nice, so you add an ID to it, but then this gets very inefficient w.r.t index use if the PK is a UUID since it is not monotonic/sortable.

deathanatos|3 years ago

UUIDs are sortable, and you can do efficient pagination on them. (The results wouldn't be in any meaningful order, but that's orthogonal.)

code_runner|3 years ago

In sqlserver (and others?) you can have one clustered index. Typically this is the primary key (doesn’t have to be though).

The clustered index is actually the physical order of the data stored on disk, so any new guid causes tremendous amounts of churn.

Not sure how modern this concern is, but I would guess a LOT of SASS companies have to consider this.

Quick edit: sequential GUIDs are obviously a thing and I believe alleviate a lot of these concerns. I do not believe that was an option in sql server 2008 r2 (a version that lived a LONG time).

butlerm|3 years ago

A clustered index is certainly not actually the physical order of the data stored on the disk, at best it tends to approximate it by placing data associated with similar keys in similar blocks. You would have to actually physically reorganize your entire clustered table or index to get it in physical order, and often you lose performance by doing so.

The performance impact of randomly distributed keys on any ordered index on that key is certainly real though, so I agree with you there.

sicp-enjoyer|3 years ago

Almost every database algorithm is based on ordering and binary search.

deathanatos|3 years ago

… and for which UUIDs are sortable.

OP called them "not sortable", which is a poor phrasing, as they're sortable. What he's likely getting at is that they're not ordered by generation time, which matters to some folks.

postalrat|3 years ago

When you need to get a sorted list or perhaps a range?

When do you need primary keys?

endisneigh|3 years ago

that doesn't really make sense though. how would you know the range to sort by? presumably you'd use the criterion directly to sort.

a better reason and justification to not use UUIDs is that they take up needless space if you're space constrained. they also have worse performance