When running a batched migration it is important to batch using a strictly monotonic field so that new rows wont get inserted in already processed range
It's not even necessarily it being strictly monotonic. That part does help though as you don't need to skip rows.
For me the bigger thing is the randomness. A uid being random for a given row means the opposite is true; any given index entry points to a completely random heap entry.
When backfilling this leads to massive write amplification. Consider a table with rows taking up 40 bytes, so roughly 200 entries per page. If I backfill 1k rows sorted by the id then under normal circumstances I'd expect to update 6-7 pages which is ~50kiB of heap writes.
Whereas if I do that sort of backfill with a uid then I'd expect to encounter each page on a separate row. That means 1k rows backfilled is going to be around 8MB of writes to the heap.
Yes, although timestamps may have collisions depending on resolution and traffic, no? Bigserials (at least in PG), are strictly monotonic (with holes).
blackenedgem|1 year ago
For me the bigger thing is the randomness. A uid being random for a given row means the opposite is true; any given index entry points to a completely random heap entry.
When backfilling this leads to massive write amplification. Consider a table with rows taking up 40 bytes, so roughly 200 entries per page. If I backfill 1k rows sorted by the id then under normal circumstances I'd expect to update 6-7 pages which is ~50kiB of heap writes.
Whereas if I do that sort of backfill with a uid then I'd expect to encounter each page on a separate row. That means 1k rows backfilled is going to be around 8MB of writes to the heap.
valenterry|1 year ago
kroolik|1 year ago
x3al|1 year ago
kroolik|1 year ago
groestl|1 year ago
kroolik|1 year ago
asah|1 year ago
kroolik|1 year ago
unknown|1 year ago
[deleted]