Moving Away from UUIDs (2018)

[+] cetra3|3 years ago|reply

If you're using them for unguessable random strings then yeah, they're not ideal.

If you're using them for providing a unique id in a distributed system, with very little chance of collision & fitting them in a db column, then they are great.

[+] ilyt|3 years ago|reply

Pretty much, my first reaction was "people use UUIDs for session tokens ? why? ?

Seems like author made some bad choices in previous systems and now just figured out why tbh.

[+] corytheboyd|3 years ago|reply

Yeah I don't really get the point of this article, if you need random values of a specific size don't use uuid, it's literally specified to be one exact length and format.

[+] Alupis|3 years ago|reply

There's probably a non-trivial amount of folks that equate a UUID with "unguessable" given their appearance. They are, after all, not sequential and using them to obscure things like number of users (using a UUID in place of an incrementing number) seems like a natural fit.

Given how easy it is to generate a UUID in most languages, and given the low likelihood of a collision within a system - it wouldn't be a huge leap to think UUID's could replace homebrewed random string generators for things like password reset tokens, etc.

[+] Waterluvian|3 years ago|reply

“Moving Away From Misusing UUIDs”

[+] hn_user2|3 years ago|reply

My only wish is that UUIDs were sortable and still contained their timestamp. When bug hunting, sometimes things become a little more obvious when there is an exact start and end to ids with issues.

[+] human|3 years ago|reply

Something I don't understand: how are UUIDs not safe given that they are probably better than 99.9999% of passwords generated by users?

[+] dheera|3 years ago|reply

Also UUID v3 and v5 produce IDs from identifiers such as URLs which can be quite useful if you want two different systems to generate the same exact UUID given knowledge of the same URL.

For example, in a REST system that needs UUIDs I'd use the REST URL of the object as the UUID.

[+] echelon|3 years ago|reply

The best format:

{opaqueTokenTypePrefix}_{crockfordEncodedEntropy}

Also: pass token through a bad words and "credit card lookalike" filter.

Optionally encode author cluster/region details in the low order bytes to resolve before eventual consistency in active-active systems.

[+] Ptchd|3 years ago|reply

> If you're using them for unguessable random strings then yeah, they're not ideal.

Why? I like to use them for private/secret URLs ...

[+] derefr|3 years ago|reply

> If you think you are likely to attract this kind of attention then you might want to carefully consider which side of the Mossad/not-Mossad threat divide you live on and maybe check your phone isn’t a piece of Uranium.

To be honest, we get something like this kind of attention (Tbps of forged requests / brute-force registration attacks per day), and all we do is provide a free API that's rate-limited per user account for multitenant-QoS reasons. People do all sorts of crazy stuff to try to sneakily drip-register a thousand accounts over several weeks so they can then launch some big job that uses all the keys in tandem to evade the rate limits.

Little do they know, they don't get any benefit from that, even while they're nominally "getting away with it"; doing that just makes our servers fall over! :P

---

Separately, you should really consider a level between "Mossad" and "not-Mossad": the sorts of people who hack crypto exchanges. They tend to use exactly the kind of "saw it in a movie"-level techniques that you'd think wouldn't happen because "you can just use rubber-hose cryptanalysis." Except if you're a socially-anxious math-genius fifteen-year-old living in Belarus, and there's a cryptosystem that you have unlimited access to a local copy of, maybe the rubber-hose cryptanalysis is actually harder!

[+] 323|3 years ago|reply

> you should really consider a level between "Mossad" and "not-Mossad": the sorts of people who hack crypto exchanges.

Bad example, since the biggest crypto hacker is North Korea.

> North Korean government-backed hackers have stolen the equivalent of billions of dollars in recent years by raiding cryptocurrency exchanges, according to the United Nations. In some cases, they’ve been able to nab hundreds of millions of dollars in a single heist, the FBI and private investigators say.*

https://edition.cnn.com/2022/07/10/politics/north-korean-hac...

[+] SkyPuncher|3 years ago|reply

IMO, this is bad math and why you probably need to be more cautious. I actually went through similar math with my team last year.

* It's long been know that UUID should never be used as a security mechanism. While the math is interesting, the fact they're using it as justification for moving away from UUIDs is concerning. It'd be like publishing a post titled "we're moving away from MD5"

* If you're using these tokens for human-entered purposes, you should implement account based rate limiting. It's nearly impossible for a brute force attack if a single account can only have, say 100 attempts per day before contacting support. There are very very use cases where a human-based token will ever need more than 20 attempts per day.

* Use long, high-character count tokens if they're intended to be machine/copy-n-paste only. Storage is cheap. Use something big and long.

Seriously, rate limit your shit. The second that rate limits are introduced you control all of the major variables in your security posture.

[+] dopidopHN|3 years ago|reply

Exactly the article glossing over rate limiting and proceeding to do some math without it kinda ruined the article for me.

The minute is rate limited it’s crumble.

[+] skybrian|3 years ago|reply

If the attacker doesn't care which account they break into, they could try a different account each time and then account-based rate limiting doesn't help as much.

(Depending on how many accounts there are that they can try.)

[+] tgv|3 years ago|reply

I've got a UUID in the URL of a new system (not a real UUID, but 128 random bits encoded in the UUID format), because that kept the existing links active. If someone were to guess one (which isn't that interesting, BTW, it's not account info or anything like it), he'd have a hard time checking a million per second, because the response is never going to be faster than 10ms or so.

[+] Aeolun|3 years ago|reply

I think the author forgets that while you may (potentially) be able to generate quintillions of UUID’s per second, there is no way you’ll be able to validate that they’re correct at the same speed.

At least, I don’t feel bad saying my server would melt if it had to serve that many requests.

[+] jongjong|3 years ago|reply

Yeah, since they're trying to guess a random UUID from all issued UUIDs, they would have to make a request for each generated UUID. Even if you assume that ISPs would allow that kind of bandwidth, the entire internet would grind to a half before the attack even began. Also, the 2 * 46 figure used to represent the number of UUIDs is way too large. With proper access control of resources, we only really need to worry about active session IDs and in a world with 8 billion people, there's no way that there would be 2 * 46 active session IDs.

If you assume that every person on earth was hooked to your service 24/7 and ignore the significant bandwidth limitation, it would still take more than 6 months for the entire Bitcoin network to hijack 1 random user's session. But it doesn't make sense to ignore bandwidth limitation anyway since it's the bottleneck. The Bitcoin network computes all these hashes in parallel and there is no way that anything close to this degree of parallelization can be achieved at the network layer.

[+] Matthias247|3 years ago|reply

+1. Most server deployments will break at less less than 10k RPS. If we talk about the largest scale public cloud deployments, we might end up in the millions to billions of RPS due lots of lots of instances. But the number will be very far off from the 24293141000000000000 rps rate that the author used for estimating the impact.

Plus any deployment that is so large that a user could actually generate a reasonable amount of traffic will likely also have some variant of a DDOS/fraud/rate-limiting protection or at least alarms and manual interventions that will kick in once such a traffic flood is observed.

[+] SkyPuncher|3 years ago|reply

While you're not wrong, this is the type of thinking that can introduce security vulnerabilities.

Your current systems might not be able to validate quintillions of UUID's per second, but your future systems might be able to (or at least get a lot closer).

[+] jurschreuder|3 years ago|reply

This article is totally wrong about everything.

An id is not a password.

Hash rate is totally not comparable with just trying every combination.

You can generate combinations instantly, you dont need a gpu for that.

The delay is how long the server takes to respond and how many connections it can have at the same time.

You're also blocked by the server after trying a couple of thousand.

[+] zerovox|3 years ago|reply

The title is misleading. This article argues that you should not use a UUID for a _session cookie or access token_, which was never the intended purpose of a UUID.

[+] hombre_fatal|3 years ago|reply

I don't think intended purpose cashes out into anything here. Either UUID has enough random bits for your case as a session token or it doesn't. UUID isn't special.

I don't find any variable of TFA's hypothetical UUID-breaker scenario convincing either. Not the number of tokens issued, nor the adversary having Bitcoin network levels of compute, nor the ability to verify tokens at anything close to that speed.

[+] fullstackchris|3 years ago|reply

yes exactly, who in their right mind would assign a UUID as a session token?!?! i mean, good point, wow, this article proves exactly why UUID shouldn't be used for such... then proceeds to show basically a method that is currently used by many... sigh

[+] eric4smith|3 years ago|reply

I had a database table where I used UUID as primary key. Big mistake. Haunts us to this day.

Not sortable. Takes a lot of space. Table relationships are annoying. Etc.

What we do instead is have a secondary UUID key and keep Bigint as primary keys. Then use the UUID column in the external context instead.

UUIDs are fine for 99.99999% of the time in your own domain.

Don’t expect universal uniqueness across all domains.

[+] dragonwriter|3 years ago|reply

The reason to use UUIDs as primary keys is to allow creating records including primary keys outside of the database before posting them, especially in distributed systems.

UUIDs are sortable, but don’t give you creation-order sorting (of course, its abusing bigint PKs to rely on them for that, too.) If you want creation-order sorting, storing a creation timestamp and sorting on that works, and I’ve never had a db that had a business requirement for creation order sorting and didn’t also have one for actual creation time.

[+] deathanatos|3 years ago|reply

> Not sortable

That's not correct; its trivial to come up with an ordering, and I don't know of a database in practice that doesn't permit sorting on a UUID.

> Table relationships are annoying

… in SQL,

  user_id REFERENCES users

It's exactly the same, regardless of the type of the column…?

> Takes a lot of space

Yes … but also no. It's 16 B vs. a serial's 4 B, I grant, but compared to a varchar, it's immaterial. (And particular in comparison to the number of times I see people use a varchar for an enum…) Certainly there could be a case where a row is wide b/c of UUIDs, but in practice, rows are wide either because of the data, or because of poor design.

[+] Quarrelsome|3 years ago|reply

AFAIK UUIDs as pks in databases is extremely standard. I would suggest the biggest downside is debugging where writing id = 5 is much easier than id = 'xxxx-xxxx-xxxx-xxxx'.

[+] dvnguyen|3 years ago|reply

Just curious to learn: when do you need sortable primary keys?

[+] JSavageOne|3 years ago|reply

When is an instance you'd need to sort by ID? How does it make table relationships annoying?

[+] Kamq|3 years ago|reply

> Takes a lot of space

Wait, did you store them as strings instead of as a native UUID type (or any sort of 128 bit integer type)?

[+] Alupis|3 years ago|reply

This is a good approach. I wonder if using a UUID-aware data type (like PostgreSQL's UUID type) would improve performance without making the second column necessary?

[+] smileysteve|3 years ago|reply

But to use both, you have to store both, to use it as a lookup, you have to [unique] index both; you can no longer easily partition (and guarantee uniqueness)

And an auto incrementing bigint doesn't guarantee order.

[+] SAI_Peregrinus|3 years ago|reply

And if you need universal uniqueness across all domains, just pick a 256-bit random number using a CSPRNG like /dev/urandom.

[+] sicp-enjoyer|3 years ago|reply

I'm surprised to hear it's not sortable. Why can't they just memcmp it?

[+] lyu07282|3 years ago|reply

Everytime I used uuids before I always ended up with terrible index performance

[+] hn_throwaway_99|3 years ago|reply

Sorry, but the main point of this article goes under YAGNI or "no, you aren't Google" for me.

If you aren't generating a thousand IDs per second for every person on the planet, you're fine.

Even from a guess-ability standpoint, it's more important to put reasonable rate limits on your endpoints than worrying about someone putting bitcoin-network-level of resources against your endpoints.

[+] myaccount9786|3 years ago|reply

> I find that 160 bits is a sweet spot of excellent security

Please, for the love of God, leave cryptography to the cryptographers.

[+] felixr|3 years ago|reply

> The dash-separated hexadecimal format takes 36 characters to represent 16 bytes of data.

You can use a different formatting. I would suggest looking at https://github.com/oculus42/short-uuid Of course if you just want a random ID, then you might not need a UUID. But UUIDs have the advantage that there are different versions and you can distinguish them; e.g. you might want a unique ID that gives you some debugging information (where/when was it created), so you use v1 and later you can decide to switch to v4 if decide you want the IDs to carry no information.

Indepedent of how you generate the ID, I think the base-57 encoding that shortUUIDs use is quite good when the IDs are user facing. Not using O,0,l,1,I in the alphabet makes IDs more readable.

[+] gcoguiec|3 years ago|reply

Some UUID alternatives:

- https://github.com/ai/nanoid

- https://github.com/segmentio/ksuid

- https://github.com/ulid/spec

[+] insanitybit|3 years ago|reply

Yeah, I mean, in general you should always think about how much entropy you need and know how much you're getting. I think it should be fairly standard knowledge that a UUID may only provide 122 bits of entropy, but then again what should be standard is not what is standard.

It should also be standard to understand the birthday paradox and when it's relevant.

In a ton of cases 122 bits is totally acceptable and it's really up to you to understand when it isn't. In fact, in lots of cases you can get away with less, like 96bits, etc.

It should be pretty easy to answer "how much do you need?" by asking what your tolerance for collisions is.

[+] stop50|3 years ago|reply

Who the heck stores uuids in their string form. Its only useful for transport. For stoage you use bytes.

[+] doty|3 years ago|reply

Am i mis-reading this article?

BESIDES not having any particular way to validate a token without asking the service, making the rate a hell of a lot slower than 2^64 tokens per second (lol wut) doesn’t it also assume that you have 2^46 valid tokens in existence? Isn’t that 70 TRILLION valid tokens, or nearly 9000 tokens per human on earth?

[+] tracker1|3 years ago|reply

Having had to this past week work on scrubbing a codebase looking for hard coded values... I will say, the prominence of the UUID format was at least very beneficial when searching beyond the configuration files. [\w\/_+-]{20,} also worked for finding longer matches, but more noise.

I'm not sure it's worth it to use more than a UUID for some use cases, but for a lot, it's fine. Maybe CUID if there's a decent library for your language/platform.

Aside... Whoever makes such a system that is generating/receiving OAuth tokens at that rate, and won't see/detect/feel a brute force attack of that scale probably didn't do anything to protect their SMS verification codes (only 6 digits), you'll definitely brute force that against a known password breach far more quickly, but okay, in either case.

[+] Joel_Mckay|3 years ago|reply

I prefer encoded GUID generators like: pid, memory location, machine hardware ID, UTC epoch date-time, back-patched hash( serialized object, and salt)

Guaranteed globally unique in concurrency, built in data integrity check, and non-blocking read-modify-Bork resistant error detection

You are welcome, =)

[+] DethNinja|3 years ago|reply

It is great that people are concerned about UUID entropy, because some implementations actually got much less than ~120 bits.

However, I think article missed the point that you shouldn’t use UUIDs as a security measure anyway.

[+] edflsafoiewq|3 years ago|reply

I've never understood why UUIDs are... a thing. I understand how it can useful to have a name for different kinds of identifiers, but what is the purpose of adding variant/version bits and unifying different kinds of IDs into a single thing called a UUID? The article for instance never did anything with the "UUID-ness" of the IDs so they might as well have started with "random 128-bit integers".

Can anyone explain this?

[+] agilob|3 years ago|reply

There are different types of UUID that serve different purposes https://web.archive.org/web/20220623031329/https://www.ietf....

[+] andreygrehov|3 years ago|reply

Does anyone have experience with KSUID (K-Sortable Unique IDentifier) ? Interested in cons.

230 comments