RFC 9562: Universally Unique IDentifiers (May 2024)

[+] londons_explore|1 year ago|reply

> UUIDv7 features a time-ordered value field derived from the widely implemented and well-known Unix Epoch timestamp source, the number of milliseconds

This just seems to be a way of creating a huge class of subtle bugs. Now, when two things happen to be created in the same millisecond, they may or may not be monotonically increasing.

Plenty of systems will end up accidentally depending on the ordering of the UUID's being the same order the UUID's were generated in. And that will hold true till the system hits production and suddenly there is enough load for that not to be true for a handful of records and the whole system fails.

[+] vhcr|1 year ago|reply

Monotonicity is addressed in section 6.2, but it's optional.

[+] swyx|1 year ago|reply

i collect a list of UUID implementations and concerns to think thru here https://github.com/swyxio/brain/blob/master/R%20-%20Dev%20No...

[+] htunnicliff|1 year ago|reply

TL;DR: Several new UUID versions have been standardized

UUIDv5 is meant for generating UUIDs from "names" that are drawn from, and unique within, some "namespace" as per Section 6.5.

UUIDv6 is a field-compatible version of UUIDv1 (Section 5.1), reordered for improved DB locality. It is expected that UUIDv6 will primarily be implemented in contexts where UUIDv1 is used.

UUIDv7 features a time-ordered value field derived from the widely implemented and well-known Unix Epoch timestamp source, the number of milliseconds since midnight 1 Jan 1970 UTC, leap seconds excluded. Generally, UUIDv7 has improved entropy characteristics over UUIDv1 (Section 5.1) or UUIDv6 (Section 5.6).

UUIDv8 provides a format for experimental or vendor-specific use cases. The only requirement is that the variant and version bits MUST be set as defined in Sections 4.1 and 4.2. UUIDv8's uniqueness will be implementation specific and MUST NOT be assumed.

The only explicitly defined bits are those of the version and variant fields, leaving 122 bits for implementation-specific UUIDs. To be clear, UUIDv8 is not a replacement for UUIDv4 (Section 5.4) where all 122 extra bits are filled with random data.

Background for the changes:

Many things have changed in the time since UUIDs were originally created. Modern applications have a need to create and utilize UUIDs as the primary identifier for a variety of different items in complex computational systems, including but not limited to database keys, file names, machine or system names, and identifiers for event-driven transactions.

[+] pspeter3|1 year ago|reply

I'm curious why they specify the UUID must have dashes in string format. It makes the UUID difficult to select with a double click.

[+] Two4|1 year ago|reply

As with IP addresses, UX/DX is not the primary concern

[+] shrimp_emoji|1 year ago|reply

Try a triple-click.

[+] azulster|1 year ago|reply

probably because the dashes have semantic meaning

[+] newprint|1 year ago|reply

you do understand that they existed way before the mouse and button became the norm ?

[+] deathanatos|1 year ago|reply

> Some UUID implementations, such as those found in Python and Microsoft, will output UUID with the string format, including dashes, enclosed in curly braces.

No … Python doesn't emit them enclosed in curly braces?

  >>> str(uuid.uuid4())
  '593a2ffb-eafc-484a-9a90-93bc91805651'

[+] LegionMammal978|1 year ago|reply

> UUIDv7 features a time-ordered value field derived from the widely implemented and well-known Unix Epoch timestamp source, the number of milliseconds since midnight 1 Jan 1970 UTC, leap seconds excluded.

That seems like a rather vague way of addressing leap seconds for UUIDv7. For positive leap seconds, an 'exclusion' of that second would suggest that the millisecond counter is halted until the leap second is over, which doesn't seem ideal for monotonicity. And an 'exclusion' of a negative leap second hardly makes any conventional sense at all, with regard to the millisecond counter.

Contrast with the timestamp of UUIDv1/v6, where positive leap seconds can just be handled by incrementing the clock sequence.

[+] fanf2|1 year ago|reply

That’s the normal way IETF RFCs describe unix seconds since the epoch, though there ought to be a normative reference to https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1...

[+] anamexis|1 year ago|reply

There will not be any leap seconds after 2035, and very likely there will never be any negative leap seconds.

[+] wrs|1 year ago|reply

I interpreted it to mean the timer is monotonic and ignores leap seconds completely. It does make it easy to implement wrong if your most convenient time API does implement leap seconds. (I don’t see why this would have anything to do with the millisecond timer? Leap seconds happen on the second.)

[+] ComplexSystems|1 year ago|reply

Surprising we're using 128 bits - some back of the napkin math tells me that may not be enough to avoid collisions...

[+] Spivak|1 year ago|reply

Depends on your problem domain. You can be Twitter/Discord sized and get away with 64 bits. When you start dedicating parts of your UUID to a timestamp the possibility of collisions does go way up since now a significant chunk of the UUID will be the same for everyone. But when you deploy this variant you aren't trying to make globally unique ids anymore, you're trying to make application unique ids. You are sill very unlikely to not also have a globally unique id because 128 bits gives a lot of room to play around.

[+] WorldMaker|1 year ago|reply

For hash functions, maybe not anymore, given the birthday paradox/pigeon-hole principle and other math problems in bucketing inputs versus the attack patterns for breaking hash functions and causing intentional collisions. For mostly purely random entropy in uses like UUID (and IPv6) the classic answer is that it is still more overall space than "atoms in the visible universe".

[+] AaronFriel|1 year ago|reply

Care to share your math? My understanding of the birthday paradox is that it is astoundingly unlikely.

[+] free_bip|1 year ago|reply

The time field ensures that collisions cannot occur until at minimum the time field rolls over.

[+] unknown|1 year ago|reply

[deleted]

[+] unknown|1 year ago|reply

[deleted]

[+] cachvico|1 year ago|reply

HNGPT, please summarize the important changes?

[+] jcrites|1 year ago|reply

They seem better geared for usage in databases as primary keys, specifically UUID versions 6 and onwards:

> Motivation. One area in which UUIDs have gained popularity is database keys. This stems from the increasingly distributed nature of modern applications. In such cases, "auto-increment" schemes that are often used by databases do not work well: the effort required to coordinate sequential numeric identifiers across a network can easily become a burden. The fact that UUIDs can be used to create unique, reasonably short values in distributed systems without requiring coordination makes them a good alternative, but UUID versions 1-5, which were originally defined by [RFC4122], lack certain other desirable characteristics [...]

> UUIDv6 is a field-compatible version of UUIDv1 (Section 5.1), reordered for improved DB locality. It is expected that UUIDv6 will primarily be implemented in contexts where UUIDv1 is used. Systems that do not involve legacy UUIDv1 SHOULD use UUIDv7 (Section 5.7) instead.

> Instead of splitting the timestamp into the low, mid, and high sections from UUIDv1, UUIDv6 changes this sequence so timestamp bytes are stored from most to least significant. That is, given a 60-bit timestamp value as specified for UUIDv1 in Section 5.1, for UUIDv6 the first 48 most significant bits are stored first, followed by the 4-bit version (same position), followed by the remaining 12 bits of the original 60-bit timestamp. [...]

> UUIDv7 features a time-ordered value field derived from the widely implemented and well-known Unix Epoch timestamp source, the number of milliseconds since midnight 1 Jan 1970 UTC, leap seconds excluded. Generally, UUIDv7 has improved entropy characteristics over UUIDv1 (Section 5.1) or UUIDv6 (Section 5.6).

[+] unknown|1 year ago|reply

[deleted]

[+] posting_mess|1 year ago|reply

[deleted]

[+] jcrites|1 year ago|reply

The problem that this standard solves isn't a math problem. It's an engineering problem of defining (adding) UUID formats that are suitable for use in database keys (and some other things). Previous proposals had disadvantages for the use-case.

This is discussed in the "Update Motivation" section of the document: https://www.rfc-editor.org/rfc/rfc9562.html#name-update-moti...

[+] sedatk|1 year ago|reply

> but we cant come up with a decent UUID scheme

maybe because we can’t come up with an unambiguous definition of “decent.”

43 comments