top | item 40889751

(no title)

kroolik | 1 year ago

When running a batched migration it is important to batch using a strictly monotonic field so that new rows wont get inserted in already processed range

discuss

order

blackenedgem|1 year ago

It's not even necessarily it being strictly monotonic. That part does help though as you don't need to skip rows.

For me the bigger thing is the randomness. A uid being random for a given row means the opposite is true; any given index entry points to a completely random heap entry.

When backfilling this leads to massive write amplification. Consider a table with rows taking up 40 bytes, so roughly 200 entries per page. If I backfill 1k rows sorted by the id then under normal circumstances I'd expect to update 6-7 pages which is ~50kiB of heap writes.

Whereas if I do that sort of backfill with a uid then I'd expect to encounter each page on a separate row. That means 1k rows backfilled is going to be around 8MB of writes to the heap.

valenterry|1 year ago

Isn't that solved because UUIDv7 can be ordered by time?

kroolik|1 year ago

Are page misses still a thing in the age of SSDs?

x3al|1 year ago

Strictly monotonic fields are quite expensive and the bigserial PK alone won't give you that.

kroolik|1 year ago

PG bigserial is already strictly monotonic

groestl|1 year ago

Okay, but in a live DB, typically you won't have only inserts while migrating, won't you?

kroolik|1 year ago

Yes, but updates are covered by updated app code

asah|1 year ago

would creation/lastmod timestamps cover this requirement?

kroolik|1 year ago

Yes, although timestamps may have collisions depending on resolution and traffic, no? Bigserials (at least in PG), are strictly monotonic (with holes).