UUIDv47: Store UUIDv7 in DB, emit UUIDv4 outside (SipHash-masked timestamp)

[+] aabbdev|5 months ago|reply

Hi, I’m the author of uuidv47. The idea is simple: keep UUIDv7 internally for database indexing and sortability, but emit UUIDv4-looking façades externally so clients don’t see timing patterns.

How it works: the 48-bit timestamp is XOR-masked with a keyed SipHash-2-4 stream derived from the UUID’s random field. The random bits are preserved, the version flips between 7 (inside) and 4 (outside), and the RFC variant is kept. The mapping is injective: (ts, rand) → (encTS, rand). Decode is just encTS ⊕ mask, so round-trip is exact.

Security: SipHash is a PRF, so observing façades doesn’t leak the key. Wrong key = wrong timestamp. Rotation can be done with a key-ID outside the UUID.

Performance: one SipHash over 10 bytes + a couple of 48-bit loads/stores. Nanosecond overhead, header-only C11, no deps, allocation-free.

Tests: SipHash reference vectors, round-trip encode/decode, and version/variant invariants.

Curious to hear feedback!

[+] JimDabell|5 months ago|reply

I like the idea.

UUIDs are often generated client-side. Am I right in thinking that this isn’t possible with this approach? Even if you let clients give you UUIDs and they gave them back the masked versions, wouldn't you be vulnerable to a client providing two UUIDs with different ts and the same rand? So this is only designed for when you are generating the UUIDv7s yourself?

[+] the_mitsuhiko|5 months ago|reply

Two pieces of feedback here:

1. You implicitly take away someone else's hypothetical benefit of leveraging UUID v7, which is disappointing for any consumer of your API.

2. By storing the UUIDs differently on your API service from internally, you're going to make your life just a tiny bit harder because now you have to go through this indirection of conversion, and I'm not sure if this is worth it.

[+] inopinatus|5 months ago|reply

My biggest concern is the entropic quality of the random bits, since the design of UUIDv7 is fundamentally more concerned with collisions than predictability; consequently, although the standard says SHOULD for their nonguessability it isn't a MUST, and leaves room for implementations that use a weak PRNG, or that increment a counter, or even place additional clock data in the apparently random bits (ref. RFC9562 s6.2 & s6.9).

So there's definitely some gotchas with relying on rand_a and rand_b in UUIDv7 for seeding a PRF, and when ingesting data from devices outside of your trust boundary (as may be the case with high-volume telemetry), even if you wrote the code they basically can't be trusted for this purpose, and if those bits are undisturbed in the output it's certainly a problem if the idea was to obfuscate serialisation, timing, or correlation.

Even generations we might assume are safe may not be completely safe; for example, the new uuidv7() in PostgreSQL 18 fills rand_a entirely from the high precision part of the timestamp, and this is RFC compliant. So if an import routine generates a big batch of such UUIDs, this v7-to-v4 scheme discloses output bits that can be used to relate individual records as part of the same group. That might be fine for data points pertaining to a vehicle engine. It might not be fine for identifiers that relate to people.

So, since not all UUIDv7 is created alike, I'd add a strong caveat: unless generating the rand_a and rand_b bits entirely oneself with a high degree of confidence in their nonguessibility, then this scheme may still leak information regarding timing, sequence, or correlation of records, and you will have to read the source code of your UUIDv7 implementation to know for sure.

[+] unknown|5 months ago|reply

[deleted]

[+] sergeyprokhoren|5 months ago|reply

Bad idea. In PostgreSQL 18 the optional parameter shift will shift the computed timestamp by the given interval

https://www.postgresql.org/docs/18/functions-uuid.html

[+] chrismorgan|5 months ago|reply

A few years ago I made a scheme whereby you could use sequential numeric IDs in your database, but expose them as short random strings (length 4–20 step 2, depending on numeric value and sparsity configuration). It used some custom instances of the Speck cipher family, and I think it’s robust and rather neat.

Although I finished it, I never quite published it properly for some reason, probably partly because I shelved the projects where I had been going to use it (I might unshelve one of them next year).

Well, I might as well share it, because it’s quite relevant here and interesting:

https://temp.chrismorgan.info/2025-09-17-tesid/

My notes on its construction, pros and cons are fairly detailed.

Maybe I’ll go back and publish it properly next year.

[+] austinjp|5 months ago|reply

Nice. See also sqids (previously known as hashids)

https://sqids.org/

[+] inopinatus|5 months ago|reply

I was interested in something similar with Speck for obfuscating bigserial PKIDs but the shortage of cross-platform implementations - especially in pgcrypto - led to choosing base58(AES_K1(id{8} || HMAC_K2(id{8})[0..7])) instead, which we could implement in almost anything and is performant enough, albeit longer output (typically 22 characters)

[+] unknown|5 months ago|reply

[deleted]

[+] chuckadams|5 months ago|reply

I remember doing something similar, but I just used two columns, a public uuid, and a bigint primary key that wasn't exposed to the api (this was long before uuidv7). Lacked a lot of the conveniences of using uuid everywhere, but it still handled the use case of merging different DB dumps as long as PKs were stripped out first.

And maybe I misunderstand how the hashing works, but it seems if you're looking things up by the hashed uuid, you're still going to want two columns anyway.

[+] connicpu|5 months ago|reply

The conversion is reversible using the secret cryptographic key so you can turn the uuidv4s from requests into your db uuidv7s.

[+] miningape|5 months ago|reply

This is interesting, but is almost something I'd rather have the DB handle for me - i.e. I can cast a UUIDv7 to "UUIDv4" (and vice versa) and I could use both in queries (with explicit syntax to annotate which kind is being used / expected)

[+] tracker1|5 months ago|reply

Interesting project... just out of curiosity, could you give something resembling a couple practical examples of the risk of exposing the time portion of a v7 UUID?

[+] NortySpock|5 months ago|reply

Suppose it's something where the user may be accused of doing something nefarious if a sequence or pattern of behavior is exposed.

- "Ex-spouse: I looked you up on a dating website, and your userID indicates it was created while you were at Tom's party where you swear nothing happened."

- "You say you are in XYZ timezone, but all your imageIDs (that are unique to the image upon creation) are timestamped at what would be 3am in your timezone)"

Granted, for individual messages that are near-real-time, or for transactions that need to be timestamped anyway, it's probably fine, but for user-account-creation or "evergreen" asset-creation, it could leak the time to a sufficiently curious individual (or an organized group that is doing data-trawling and cross-correlation)

[+] bangaladore|5 months ago|reply

I've done CTFs in that past where a UUID is used to brute force an AES key. As the key was derived partially from the time source so by knowing the system time close to when the data was encrypted you could pretty easily brute force the key.

A more simple example is a URL for say a file / photo share service. You allow users to upload images, and you return them back website.com/GUID. That's it. You don't provide a way to see when that photo / file was updated, but because you use a UUIDv7 you just did.

Is this a security risk? Maybe or maybe not? But it's an unintended disclosure of information.

[+] thunderfork|5 months ago|reply

Let's say you've got a system that collects medical data - like "store the results of the MRI right after it happens".

For analysis reasons, you want to share this dataset (e.g. for diagnostics on the machine) but first must strip it of potentially identifying information.

The uuidv7 timestamp could be used to re-identify the data through correlation - "I know this person got an MRI on this day, there's only one record with a matching datestamp, thus I know it's their MRI."

[+] sgarland|5 months ago|reply

This is cool, but the entire “OMG you can’t leak timestamps” has always reeked of security theater to me, as has the argument that if you expose sequential IDs, you’re opening vectors of attack, exposing business information, etc.

Add some random large value to your ints periodically - they’ll still be monotonic, but you’ll throw off the dastardly spies stealing your super valuable business intelligence.

[+] danhau|5 months ago|reply

You‘re not exposing business information, you‘re exposing client information. The information a system leaks might not be intrinsically valuable, but it can be used to deduce other data, especially over larger sets or time.

For example, by only scraping the date and author of an online newspaper‘s articles over a period of time, you can deduce when every author is typically on vacation. Compare that against every other author and you can find patterns indicating, say, workplace affairs.

Source: a talk by David Kreisel called SpiegelMining (in German), or at least what I remember.

[+] bismark|5 months ago|reply

My biggest issue w/ UUIDv7 is how challenging they are to visually diff when looking at a list. Having some sort of visual translation layer in psql that would render them with the random bits first while maintaining the time sorting underneath would be a major UX boost...

[+] phs2501|5 months ago|reply

I just taught myself to look at the end of the UUID, rather than the beginning.

[+] unknown|5 months ago|reply

[deleted]

[+] nine_k|5 months ago|reply

Write a function that does that, use it in your queries. E.g. simple hex representation + string reversal should help. Or a reversed base64 representation for shorter output.

[+] funcimp|5 months ago|reply

This is super cool. I decided to code up a Go implementation with the help of dchest's excellent siphash library.

https://github.com/n2p5/uuid47

refs: https://github.com/dchest/siphash

[+] g-mork|5 months ago|reply

Vaguely related technique with similar goals (but I love the one posted here) http://blog.notdot.net/2007/9/Damn-Cool-Algorithms-Part-2-Se...

[+] timando|5 months ago|reply

Why does it use version 4 instead of version 8? Version 4 implies that it's random bits, but it's actually not random. Version 8 doesn't imply anything about what the bits mean.

[+] flowerthoughts|5 months ago|reply

I can't answer that, but as long as it's a high entropy algorithm, this seems fair game. You could see it as a seeded PRNG. The whole point of the exercise is to make it look random to the outside. Perhaps v8 stands out too much.

[+] devnull3|5 months ago|reply

Why not use a different encryption key per session and stamp encrypted ids (or whatever info) to the outside word.

This way the DBs can use simple sequence numbers instead of timestamp based IDs.

[+] conradludgate|5 months ago|reply

You have to know what key to use to decrypt the timestamp bits of the token. If you change keys regularly you have the problem of keeping lots of keys, as well as somehow determining the right key

[+] taminka|5 months ago|reply

i'm curious, if you're doing single header, why not also do the stb-style IMPL block + definitions block such that you avoid the issues from accidentally including the header multiple times?

[+] LeicaLatte|5 months ago|reply

Mobile apps often sort by creation time in the UI (chat messages, activity feeds). Since clients only see the masked version, there might be a need to expose a separate timestamp field.

[+] londons_explore|5 months ago|reply

Before using this....

Consider what you'll do if someone ever gets root in your web server and leaks the key.

Suddenly all your UUID's need to be replaced. That tends to be impossible since they're probably part of published URL's etc.

Big companies have made similar mistakes - that's probably why for example all private YouTube videos and Google docs had their links invalidated a few years back when the key security of a decade old key couldn't be certain and the key wasn't rotatable.

TL;DR: Never use anything where you cannot rotate a key, including this.

[+] gwbas1c|5 months ago|reply

I started encrypting database IDs and deriving GUIDs from that.

[+] salterdavid032|5 months ago|reply

[deleted]

[+] optimize_prime|5 months ago|reply

[deleted]

[+] themafia|5 months ago|reply

Why not just use UUIDv8? The format allows you to use the upper bits for a timestamp and the lower bits for any value you like, including just a random value.

[+] michelpp|5 months ago|reply

Because then you leak the timestamp. The idea is, present what looks like v4 random uuids externally, but they are stored internally with v7 which greatly improves locality and index usability. The conversion back and forth happens with a secret key.

[+] pluto_modadic|5 months ago|reply

this is solved by reading the repo's README: hiding timing information.

[+] jppope|5 months ago|reply

Sounds like its trying to achieve something similar to what ULID is going for: https://github.com/ulid/spec

timestamp + readability

[+] mcdonje|5 months ago|reply

Except the timestamp is in the ULID for anyone to read. UUID47 hides that from external parties.

88 comments