top | item 28091456

(no title)

NyxWulf | 4 years ago

v7 allows to you trade time precision for randomness. Adding time precision allows the surface area for conflicts to decrease, so the loss of random bits to higher time precision is worth it if you are generating a lot of ids. Unless it is a security application, I would use v7 for most things.

discuss

order

chociej|4 years ago

Trying to wrap my head around how this works when distributed. If v7 is used with a high time precision, it's important that each node in the distributed system have a very tightly synchronized time, right?

If so, with v7, I'd feel the need to find the right balance of time precision, time synchronization, node identification, and randomness. Just some thoughts out loud.

Lazare|4 years ago

Synchronization isn't required, actually. The timestamp is there to make the UUIDs roughly k-sortable, and to provide a "namespace" for the random values to reduce collisions.

Imagine you had three devices one of which has an accurate wall clock, one of which is 5 minutes slow, and one of which is 10 minutes slow, and you're using millisecond precision. So what happens is:

For a given arbitrary "accurate" time such as UTC 00:00:00.5 Dec 1st 2021 (500ms after midnight), device one gets 1 millisecond to generate UUIDs into that timestamp "bucket". If it generates enough, it might get a collision, but likely it'll be fine. After 1 millisecond, its wall clock moves on to 501ms after midnight, and it stops generating UUIDs that (might) collide with other UUIDs generated at 500ms after midnight. Five minutes later at 00:05:00.500 the second device (incorrectly) thinks it's 500ms after midnight, and starts generating UUIDs for that millisecond "bucket". And then after another 5 minutes, the third device does. And we can extend it; an hour later a glitchy NTP server (or daylight savings, or whatever) might cause the devices to all roll their time back, and then they all go through the "500ms after midnight" window a second time, getting another shot to generate conflicts. And that's fine!

What matters is the total number of UUIDs you generate (over the entire lifetime of your system) by devices which believe (correctly or not) that it's that specific timestamp.

Now, one potential gotcha is that a user might be overly clever and think they can back the timestamp out of the UUID to figure out exactly when it was created. That does require precise synchronisation if you want accurate answers, but if you're trying to figure out the exact creation time to the millisecond of UUIDs created in a distributed fashion...well, eh, hopefully you know what you're doing. :)