> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time... Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
In other words, you can only use UUIDv7 for rows that never need to be looked up by any data coming from the user. And maybe that exists sometimes for certain data in JOINs... but it seems like it might be more the exception than the rule, and you never know when an internal ID might need to become an external one in the future.
Using UUIDv4 as primary key has unexpected downsides because data locality matters in surprising places [1].
A UUIDv7 primary key seems to reduce / eliminate those problems.
If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.
If this is a concern, pass your UUIDv7 ID through an ECB block cipher with a 0 IV. 128 bit UUID, 128 bit AES block. Easy, near zero overhead way to scramble and unscramble IDs as they go in/out of your application.
There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly
Great point. Also, having to support multiple IDs is a maintenance headache.
IMO, a major problem solved by UUIDs is the ability to create IDs on the client-side, hence, they are inherently user-facing. A major reason why this is an important use case for UUIDs is because it allows clients to avoid accidental duplication of records when an insertion fails due to network issues. It provides insertion idempotence.
For example, when the user clicks on a button on a form to insert a record into a database, the client can generate the UUID on the client-side, then attach it to a JSON object, then send the object to the server for insertion; in the meantime, if there is a network issue and it's unclear whether or not the record was inserted, the code can automatically retry (or user can manually retry) and there is no risk of duplication of data if you use the same UUID.
This is impossible to do with auto-incrementing IDs because those are generated by the database in a centralized way so the user cannot know the ID head of time and thus, if there is a network failure while submitting a form, the client cannot automatically know whether or not the record was successfully inserted; if they retry, they may create a duplicate record in the database. There is no way to make the operation idempotent without relying on some kind of fixed ID which has a uniqueness constraint on the database side.
Question: why not use UUIDv7 but encrypt the user-facing ID you hand out? Then it's just a quick decrypt-on-lookup, and you have the added bonus of e.g. being able to give different users different IDs
In a well normalized setup idk maybe not. Uuidv4 for your external ids and then have a mapping table to correspond that to something you’d use internally. Then you can torch an exposed uuid update the mapping table and generate a new one and none of your pointers and foreign keys need to change internally.
When you see v7 vs. V4, you'd expect the higher number to be better, hopefully better in all aspects, I wouldn't have expected such a thoughtful consideration to be required before upgrading. UUID-b would've been a better name then ;)
You probably shouldn't / don't need to use v7 for your Users table because the age of your User probably has limted to no bearing on the look up patterns. For example, our Steam and Amazon accounts are pretty old, but we likely still use them.
However, your Orders table is significantly more likely to be looked up based on time, so a v7 makes a lot of sense here.
Now I'd argue the security implications are overblown, but in general tems you might also allow someone to look up a user, i.e. you can view my Steam profile, or maybe my Amazon wishlist. You probably don't need to be looking up another Users Order.
Alternativly, if your building an Enterprise Risk Solution, you could take a view that you don't want people knowing how old the risk is, but most solutions would show you some history and would believe that to be pertinent information.
There will be instances of getting it wrong, but it isn't actually _that_ complicated.
If leaking creation time is a concern, can we not just fake the timestamp?
We can do so in a way that most performance benefits remain - so like starting with a base time of 1970 and then adding base time to it intermittently, having random months and days to new records (or maybe based on the user's id - so the user's record are temporally consistent but they aren't with other user records).
I'm sure there might be a middle ground where most of the performance gains remain but the deanonymizing risk is greatly reduced.
Edit: encrypting the value in transit seems a simpler solution really
> What can go wrong with using UUIDv7
Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.
> This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization. While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach. Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
> Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
So then what's the point? How I always did things in the past was use an auto increment big int as the internal primary key, and then use a separate random UUID for the external facing key. I think this recommendation from "experts" is pretty dumb because you get very little benefit using UUIDV7 (beyond some portability improvements) if you're still using a separate internal key.
While I wouldn't use UUIDV7 as a secure token like I would UUIDV4, I don't see anything wrong with using UUIDV7 as externally exposed object keys - you're still going to need permissions checks anyway.
I wish Postgres would just allow you look up records by the random component of the field, what are the chances of collisions with 80 bits of randomness? My guess is it’s still enough.
UUIDv7 is only bad for range partitioning and privacy concerns.
The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.
I want to see something like HBase or S3 paths when UUIDv7 gets used.
> UUIDv7 is only bad for range partitioning and privacy concerns.
It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.
As for range partitioning, you can of course choose to partition on the hash of the UUIDv7 at the cost of giving up cheaper rights / faster indices. On the other hand, that of course gives up locality which is a common challenge of partitioning schemes. It depends on the end-to-end design of the system but I wouldn't say that UUIDv7 is inherently good or bad or better/worse than other UUID schemes.
Why is it bad for range partitioning? If anything, it's better? With UUIDv7, you basically can partition on primary key, thus you can have "global" unique constraint.
The article compares UUIDv7 vs v4, but doesn't say why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?
Good question. There's a few reasons to pick UUID over serial keys:
- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.
- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.
- Because UUIDs can be generated anywhere, sharding is easier.
The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.
I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.
So the client side can create the ID before insert - that's the case that (mostly) drives it for me. The other is where you have distributed systems and then later want to merge the data and not have any ID conflicts.
For when you inevitably need to expose the ids to the public the uuids prevent a number of attacks that sequential numbers are vulnerable to. In theory they can also be faster/convenient in a certain view as you can generate a UUID without needing something like a central index to coordinate how they are created. They can also be treated as globally unique which can be useful in certain contexts. I don't think anyone would argue that their performance overall is better than serial/bigserial though as they take up more space in indexes.
If you need an opaque ID like a uuid because, for example, you need the capability to generate non-colliding IDs generated by disparate systems, the best way I've found is to separate these two concerns. Use a UUIDv4 for public purposes and a bigint internally. You don't need to worry about exposing creation time, and you can still manage your data in the home system with all the properties that a total ordering affords.
I believe the concern is if your primary key in the database is a serial number it might be exposed to users unless you do extra work to hide that ID from any external APIs and if there are any flaws in your authorization checks it can allow enumeration attacks exposing private or semi-private info. With UUIDs being virtually unguessable that makes it less of a concern.
>why you'd do either instead of just serial/bigserial, which has always been my goto. Did I miss something?
So the common response is sequential ID crawling by bad actors. UUIDs are generally un-guessable and you can throw them into slop DBs like Mongo or storage like S3 as primary identifiers without worrying about permissions or having a clever interested party pwn your whole database. A common case of security through obscurity.
Tangential, but I'm grateful to this article for teaching me that Postgres has "table foo" as shorthand for "select * from foo". I won't use that in code, but I'll happily use it for interactive queries.
It never occurred to me that Postgres is more efficient when inserting monotonic values. It's the nature of B+ trees so it makes sense. But in the world of distributed databases, monotonic inserts create hot partitions and scalability problems, so evenly-distributed ids are preferred.
It's the nature of B+ trees, multiplied by the nature of clustered indexes: if you use a UUIDv4 as a primary key, your entire row gets moved to random locations, which really sucks when you normally retrieve them sequentially. With a non-clustered index (say, your UUIDv4 id you use for public APIs when you don't want to leak the v7 info) then you'll still get more fragmentation with the random data, but it's something autovacuum can usually keep up with. But it's more work it has to do on top of everything else it does.
Leaky abstractions in databases are one of the reasons every developer should read the table of contents of the hot databases used by the things he’s working on. IME almost no one does that.
My customers return created_at attributes in all their API calls so UUIDv7 won't harm them at all. They also use sequential ids. Only one of them ever used UUIDv4 as primary key. We didn't have any performance problem but the whole production system was run by one psql insurance and one Elixir application server. Probably almost any architectural choice is good at that scale.
Is there an unavoidable tradeoff here? Keys that order nicely (auto-incrementing integers, UUIDv7) naturally leak information. Keys that are more secure (UUIDv4) can have performance problems because they have poor locality.
Or are there any random id generators that can compromise, remain sequential-ish without leaking exact timestamps and global ordering?
Symmetric encryption of IDs at the edge. Optional embedded HMAC. Optional text encoding. For monotonic bigserial values I'm somewhat fond of base58(AES_K1(id{8} || HMAC_K2(id{8})[0..7])) with purpose/table-salted HKDF subkeys from a scrypt'd system passphrase. The hot path of this is pretty fast. As with all cryptographic solutions it comes with a whole new jungle of pitfalls, caveats, and tradeoffs, but it works.
Yes. The spatial locality benefits drop off quite quickly. A hashed uuidv7-like scheme with a rotating salt, for example, would keep short term locality and it's performance benefits while not having long term locality and it's downsides.
The tradeoff is unavoidable. At one end is UUIDv4. At the far end is a gray code with excellent locality, but inherently allows you to know which half of the indices the record is from (even without inverting it). UUIDv7 is a pretty good middle ground.
I can confirm on the performance benefits. I wanted to start with uuidv7 for a new DB earlier this year, so I put together a function to use in the meantime. Once the function is available natively, we'll just migrate to use it instead.
For anyone interested:
CREATE FUNCTION uuidv7() RETURNS uuid
AS $$
-- Get base random UUID and overlay timestamp
select encode(
set_bit(
set_bit(
overlay(uuid_send(gen_random_uuid()) placing
substring(int8send((extract(epoch from clock_timestamp())*1000)::bigint) from 3)
from 1 for 6),
52, 1), -- Set version bits to 0111
53, 1), 'hex')::uuid;
$$ LANGUAGE sql volatile;
> Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs.
I would not call this “generally discouraged” when APIs generally surface a created_at timestamp in their responses. A real life example are Stripe IDs which have similar properties (k-sorted) as UUIDv7: https://brandur.org/nanoglyphs/026-ids#ulids
> While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach
The correct way is 1. generate ID on server side, not client side 2. always validate data access permission of all IDs sent from client
Predictable ID is only unsafe if you don't validate data access permission of IDs sent from client. Also, UUIDv7 is much less predictable than auto-increment ID.
But I do agree that having create time in public-facing ID can leak analytical information.
API responses should be limited to authenticated users. IDs are often present in hyperlinks that are included in insecure emails, or in URLs that, being routed through all sorts of networking hops may be captured and available as metadata.
It's a 128 bit number. If you express that number in base 62 (26 upper case letters + 26 downcase letters + 10 digits) you need only a bit more than 20 characters. You can compress it further by increasing the base and using other 8 bit ASCII characters.
Crockford base32 [0] is the best compromise, IMO. Reasonable length of 26 chars. Uses only alphanumeric characters and avoids issues with case sensitivity and confusing characters (0 vs O, etc.).
One thing that’s not quite clear to me is how safe it is to generate v7 uuids on the client.
That’s one of the nice properties of v4 uuids: you can make up a primary key of a new entity directly on the client and the database can use it directly. Sure: there is tiny collision risk, but it’s so small, you can get away with mostly ignoring it
With v7 however, such a large chunk of the uuid is based on the time, so I’m not sure whether it’s still safe to ignore collisions in any application, especially when you consider client’s clocks to probably be very inaccurate.
Sequential primary keys are pretty important for scalable, stable sorting by record creation time using the primary keys' index similar to serial (int) but avoids the guessing vulnerability. For this use-case, an UUID "v9"-like approach can be a better option: https://uuidv9.jhunt.dev
Does anyone have performance issues with uuidv4? I worked with a db with 10s of billions of rows, no issues whatsoever. Would love to hear the mileage of fellow engineers
I've had issues in a database with billions of rows where the PKs were a UUID. Indices on PK, and also foreign keys from other tables pointing to that table were pretty big, so much so that the indices themselves didn't all fit in memory. Like we would have an index on customer_id, document_id, both UUIDv4. DB didn't have UUID support, so they were stored as strings, so just 1 billion rows took ~30 GiB memory for PK index, 60GiB for the composite indices etc. So eventually the indices would not fit in memory. If we had UUID support or stored as bytes it might have halved it, but eventually become too big.
If you needed to look up say the 100 most recent documents, that would require ~100+ disk seeks at random locations just to look up the index due to the random nature of UUIDv4. If they were sequential or even just semi-sequential that would reduce the number of lookups to just a few, and they would be more likely to be cached since most hits would be to more recent rows. Having it roughly ordered by time would also help with e.g. partitioning. With no partitioning, as the table grows, it'll still have to traverse the B-Tree that has lots of entries from 5 years ago. With partitioning by year or year-month it only has to look at a small subset of that, which could fit easily in memory.
What database were you using? For example with SQL server, by default it clusters data on disk by primary key. Random (non-sequential) PKs like uuidv4 require random cluster shuffling to insert a row “in the middle” of a cluster, increasing io load and causing performance issues.
Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.
Honestly not really. Yes random keys make inserts slower. But if inserts are only 1% of your database load, then yeah it's basically no issues whatsoever.
On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.
I believe you're referring to our January 2020 Kafka incident where a logic bug caused data loss for a customer. It was a serious failure and a huge learning moment for us.
The platform we operate today is fundamentally different and far more resilient than it was five years ago. We've scaled significantly (recently passing $100M ARR) because we took those early lessons seriously and continue to prioritize reliability.
ULID is the best balance imo, it's more compact, can be double clicked to select, and case-insensitive so it can be saved on macOS filesystems without conflicts.
Now someone should make a UUIDv7 -> ULID adapter lib that 1:1 translates UUIDv7 <-> ULID preserving all the timestamp resolution and randomness bits so we can use the db-level UUIDv7 support to store ULIDs.
crazygringo|4 months ago
So this basically defeats the entire performance improvement of UUIDv7. Because anything coming from the user will need to look up a UUIDv4, which means every new row needs to create an extra random UUIDv4 which gets inserted into a second B-tree index, which recreates the very performance problem UUIDv7 is supposedly solving.
In other words, you can only use UUIDv7 for rows that never need to be looked up by any data coming from the user. And maybe that exists sometimes for certain data in JOINs... but it seems like it might be more the exception than the rule, and you never know when an internal ID might need to become an external one in the future.
tracker1|4 months ago
matthew16550|4 months ago
A UUIDv7 primary key seems to reduce / eliminate those problems.
If there is also an indexed UUIDv4 column for external id, I suspect it would not be used as often as the primary key index so would not cancel out the performance improvements of UUIDv7.
[1] https://www.cybertec-postgresql.com/en/unexpected-downsides-...
oconnore|4 months ago
There is no need to put the privacy preserving ID in a database index when you can calculate the mapping on the fly
jongjong|4 months ago
IMO, a major problem solved by UUIDs is the ability to create IDs on the client-side, hence, they are inherently user-facing. A major reason why this is an important use case for UUIDs is because it allows clients to avoid accidental duplication of records when an insertion fails due to network issues. It provides insertion idempotence.
For example, when the user clicks on a button on a form to insert a record into a database, the client can generate the UUID on the client-side, then attach it to a JSON object, then send the object to the server for insertion; in the meantime, if there is a network issue and it's unclear whether or not the record was inserted, the code can automatically retry (or user can manually retry) and there is no risk of duplication of data if you use the same UUID.
This is impossible to do with auto-incrementing IDs because those are generated by the database in a centralized way so the user cannot know the ID head of time and thus, if there is a network failure while submitting a form, the client cannot automatically know whether or not the record was successfully inserted; if they retry, they may create a duplicate record in the database. There is no way to make the operation idempotent without relying on some kind of fixed ID which has a uniqueness constraint on the database side.
macote|4 months ago
tekne|4 months ago
gigatexal|4 months ago
lukebechtel|4 months ago
sgarland|4 months ago
If it does matter for your application, then don't expose it - use an opaque id with something like AEAD, and expose that.
sverhagen|4 months ago
ownagefool|4 months ago
You probably shouldn't / don't need to use v7 for your Users table because the age of your User probably has limted to no bearing on the look up patterns. For example, our Steam and Amazon accounts are pretty old, but we likely still use them.
However, your Orders table is significantly more likely to be looked up based on time, so a v7 makes a lot of sense here.
Now I'd argue the security implications are overblown, but in general tems you might also allow someone to look up a user, i.e. you can view my Steam profile, or maybe my Amazon wishlist. You probably don't need to be looking up another Users Order.
Alternativly, if your building an Enterprise Risk Solution, you could take a view that you don't want people knowing how old the risk is, but most solutions would show you some history and would believe that to be pertinent information.
There will be instances of getting it wrong, but it isn't actually _that_ complicated.
saaspirant|4 months ago
The id would be exposed to users. An integer would expose the number of records in it.
Am I using right guys?
djantje|4 months ago
And then having uuidv7 as primary and foreign keys, can give you a performance gain.
Illniyar|4 months ago
I'm sure there might be a middle ground where most of the performance gains remain but the deanonymizing risk is greatly reduced.
Edit: encrypting the value in transit seems a simpler solution really
tonyhart7|4 months ago
which is pointless
pqdbr|4 months ago
> What can go wrong with using UUIDv7 Using UUIDv7 is generally discouraged for security when the primary key is exposed to end users in external-facing applications or APIs. The main issue is that UUIDv7 incorporates a 48-bit Unix timestamp as its most significant part, meaning the identifier itself leaks the record's creation time.
> This leakage is primarily a privacy concern. Attackers can use the timing data as metadata for de-anonymization or account correlation, potentially revealing activity patterns or growth rates within an organization. While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach. Experts recommend using UUIDv7 only for internal keys and exposing a separate, truly random UUIDv4 as an external identifier.
SahAssar|4 months ago
What experts? For what scenarios specifically? When do they consider time-of-creation to be sensitive?
dgb23|4 months ago
hn_throwaway_99|4 months ago
So then what's the point? How I always did things in the past was use an auto increment big int as the internal primary key, and then use a separate random UUID for the external facing key. I think this recommendation from "experts" is pretty dumb because you get very little benefit using UUIDV7 (beyond some portability improvements) if you're still using a separate internal key.
While I wouldn't use UUIDV7 as a secure token like I would UUIDV4, I don't see anything wrong with using UUIDV7 as externally exposed object keys - you're still going to need permissions checks anyway.
andy_ppp|4 months ago
themafia|4 months ago
I honestly don't see how.
gopalv|4 months ago
The "naturally sortable" is a good thing for postgres and for most people who want to use UUID, because there is no sorted distribution buckets where the last bucket always grows when inserting.
I want to see something like HBase or S3 paths when UUIDv7 gets used.
vlovich123|4 months ago
It's no worse for privacy than other UUID variants if the "privacy" you're worried about leaking is the creation time of the UUID.
As for range partitioning, you can of course choose to partition on the hash of the UUIDv7 at the cost of giving up cheaper rights / faster indices. On the other hand, that of course gives up locality which is a common challenge of partitioning schemes. It depends on the end-to-end design of the system but I wouldn't say that UUIDv7 is inherently good or bad or better/worse than other UUID schemes.
parthdesai|4 months ago
wara23arish|4 months ago
I assume there would be some type of index on the timestamp portion & the uuid portion?
wouldn’t that make it better for partitioning since we’d only need to query partitions that match the timestamp portion
morshu9001|4 months ago
molf|4 months ago
- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.
- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.
- Because UUIDs can be generated anywhere, sharding is easier.
The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.
I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.
edoceo|4 months ago
Deadron|4 months ago
ibejoeb|4 months ago
nextaccountic|4 months ago
bigserial must by generated by the db
simongr3dal|4 months ago
mhuffman|4 months ago
So the common response is sequential ID crawling by bad actors. UUIDs are generally un-guessable and you can throw them into slop DBs like Mongo or storage like S3 as primary identifiers without worrying about permissions or having a clever interested party pwn your whole database. A common case of security through obscurity.
martinky24|4 months ago
caymanjim|4 months ago
stickfigure|4 months ago
In other words, "don't try this with CRDB".
chuckadams|4 months ago
baq|4 months ago
therealdrag0|4 months ago
pmontra|4 months ago
perrygeo|4 months ago
Or are there any random id generators that can compromise, remain sequential-ish without leaking exact timestamps and global ordering?
inopinatus|4 months ago
mjb|4 months ago
AlotOfReading|4 months ago
unknown|4 months ago
[deleted]
mfrye0|4 months ago
For anyone interested:
CREATE FUNCTION uuidv7() RETURNS uuid AS $$ -- Get base random UUID and overlay timestamp select encode( set_bit( set_bit( overlay(uuid_send(gen_random_uuid()) placing substring(int8send((extract(epoch from clock_timestamp())*1000)::bigint) from 3) from 1 for 6), 52, 1), -- Set version bits to 0111 53, 1), 'hex')::uuid; $$ LANGUAGE sql volatile;
Rafert|4 months ago
I would not call this “generally discouraged” when APIs generally surface a created_at timestamp in their responses. A real life example are Stripe IDs which have similar properties (k-sorted) as UUIDv7: https://brandur.org/nanoglyphs/026-ids#ulids
delifue|4 months ago
> While UUIDv7 still contains random data, relying on the primary key for security is considered a flawed approach
The correct way is 1. generate ID on server side, not client side 2. always validate data access permission of all IDs sent from client
Predictable ID is only unsafe if you don't validate data access permission of IDs sent from client. Also, UUIDv7 is much less predictable than auto-increment ID.
But I do agree that having create time in public-facing ID can leak analytical information.
bearjaws|4 months ago
There are wild scenarios you can come up with where you may leak something, but that assumes the information isn't coming over anyway.
"Reveals account creation time" - most APIs return this in API responses by default.
When have you seen just a list of UUIDs and no other more revealing metadata?
Meanwhile what pwns 99% of companies? Phishing.
sverhagen|4 months ago
gnatolf|4 months ago
pmontra|4 months ago
Merad|4 months ago
0: https://www.crockford.com/base32.html
lucasyvas|4 months ago
And the amount of information it leaks is negligible - they might know the oldest and the newest and there’s an infinite gulf in between.
It’s better and more practical than SERIAL or BIGSERIAL in every way - if you need a random/external ID, add a second column. Done.
morshu9001|4 months ago
Biganon|4 months ago
As others have stated, it completely defeats the performance purpose, if you need to lookup using another ID.
pilif|4 months ago
That’s one of the nice properties of v4 uuids: you can make up a primary key of a new entity directly on the client and the database can use it directly. Sure: there is tiny collision risk, but it’s so small, you can get away with mostly ignoring it
With v7 however, such a large chunk of the uuid is based on the time, so I’m not sure whether it’s still safe to ignore collisions in any application, especially when you consider client’s clocks to probably be very inaccurate.
Am I overthinking things here?
PhilippGille|4 months ago
With UUIDv7 it's split into:
- 48 bits: Unix timestamp in milliseconds
- 12 bis: Sub-millisecond timestamp fraction for additional ordering
- 62 bits: Random data for uniqueness
- 6 bits: Version and variant identifiers
So >4,600,000,000,000,000,000 IDs per fraction of a millisecond.
And unprecise time on the client doesn't matter, because some are ahead and some behind, vut that doesn't make them more likely to clash.
qeternity|4 months ago
turrini|4 months ago
[1] https://github.com/stateless-me/uuidv47
[2] https://news.ycombinator.com/item?id=45275973
burnt-resistor|4 months ago
rvitorper|4 months ago
zerd|4 months ago
If you needed to look up say the 100 most recent documents, that would require ~100+ disk seeks at random locations just to look up the index due to the random nature of UUIDv4. If they were sequential or even just semi-sequential that would reduce the number of lookups to just a few, and they would be more likely to be cached since most hits would be to more recent rows. Having it roughly ordered by time would also help with e.g. partitioning. With no partitioning, as the table grows, it'll still have to traverse the B-Tree that has lots of entries from 5 years ago. With partitioning by year or year-month it only has to look at a small subset of that, which could fit easily in memory.
cipehr|4 months ago
Postgres on the other hand doesn’t do clustered indexing on the PK… if I recall correctly.
crazygringo|4 months ago
On the other hand, if you're basically logging to your database so inserts are like 99% of the load, then it's something to consider.
unknown|4 months ago
[deleted]
klysm|4 months ago
MaKey|4 months ago
oskari|4 months ago
The platform we operate today is fundamentally different and far more resilient than it was five years ago. We've scaled significantly (recently passing $100M ARR) because we took those early lessons seriously and continue to prioritize reliability.
qntmfred|4 months ago
nikisweeting|4 months ago
Now someone should make a UUIDv7 -> ULID adapter lib that 1:1 translates UUIDv7 <-> ULID preserving all the timestamp resolution and randomness bits so we can use the db-level UUIDv7 support to store ULIDs.
clintonb|4 months ago
thewisenerd|4 months ago
uuidv7 (-) and nanoid (_-) have special characters which urlencode to themselves.
none are small enough that you want someone reading them over the phone; but from a character legibility, ulid makes more sense.
6r17|4 months ago
unknown|4 months ago
[deleted]
unknown|4 months ago
[deleted]
unknown|4 months ago
[deleted]
unknown|4 months ago
[deleted]
unknown|4 months ago
[deleted]
unknown|4 months ago
[deleted]