For my current project I decided to try out 120-bit/15-byte identifiers. 120 is evenly divisible by 4 (hex), 5 (base32), 6 (base64), and 8. Neither base32 nor base64 encodings need padding. You can usually leave padding out regardless, but with 120 bits it's not even a choice you have to worry about, and there are no wasted bits.
Having it be evenly divisible by all those bases is neat!
But I wouldn't say "no wasted bits" unless 120 bits is also exactly the right length for a particular use case. For example, if my use case would be served better by 64 bits, I'd prefer losing some bits to padding than extending it to 120 bits.
If you're going to change the order defined in RFC 4648, you might as well change the character set, too. Since one of the goals is to minimize the use of characters that can be confused for another, I would add back 8 and 9 and remove two of i, l and o.
This removes '1' but not 'I'. But if you want to avoid ambiguity, you need to remove _both_ characters, not just one. Users don't know your IDs never contain a '1', so when they see 'I', they'll still think, "hmm, is that a 1?"
Crockford's version of base32 [0] (sometimes called crockford32) treats 0, o, and O the same when decoding, likewise for 1, i, I, l and L. It also doesn't use the letter U, to reduce the chances of including English obscenities.
So, you don't need to remove ambiguities if you can collapse the ambiguities into equivalence classes.
If you're going to use some flavor of base32, crockford32 is hard to argue against. I could bikeshed some alternatives (I would use the letter U and include Q in the 0 equivalence class, and not worry too much about obscenities), but of the somewhat common base32 standards, crockford32 is the best, IMNSHO.
Another option is to generate a ULID and just take the timestamp part (6 bytes, I forget how many characters). Uses Crockford Base32 already so no need for the custom alphabet.
That is definitely taking the long way around.... Why not just grab a timestamp and base32 it yourself? ULIDs also use millisecond timestamps, and use implementation magic to sort IDs within a millisecond. If you are just using the timestamp portion, you should use micro or nanoseconds.
I've never been sure of the use case of hashids. They're not really a security measure since they can be easily defeated; you might as well just expose the sequential ID at that point. If you do need something with better properties, you should probably be using UUIDv6.
I use shortuuid[0] for that stuff, which also omits the capital letter I, and has some other niceties (I wrote the library). It works really well, and I like how small the IDs are.
Now that you made me think about it, I think both this approach and the original math.big one will cause more base32 digits to be added early than needed, breaking sorting earlier than needed, since slugs with different number of chars won't sort properly (think 1, 10, 11, ..., 2, 20...).
Right now the time is converted to 4 bytes which then uses 7 base32 characters to encode. When the time rolls over to 33 bits in 2106, it will increase to 5 bytes encoded as 8 base32 chars, even though 7 base32 chars have enough bits (35) to represent the number.
So perhaps a better way would be something like:
func atomSlug(publishedAt time.Time) string {
b := make([]byte, 0, 8)
b = binary.BigEndian.AppendUint64(b, uint64(publishedAt.Unix()))
s = lexicographicBase32Encoding.EncodeToString(b)
return strings.TrimLeft(s, '2'); // '2' is all zero bits
}
This will stick with 7 base32 chars until you exceed 35 bits in 3058.
This is clever and thoughtful, but it encodes a 32-bit timestamp in seven base32 characters. It would have been simpler and just one more character to use eight hex digits. Alternatively, maybe go to 4 second resolution and use 30 bit timestamps, so they fit in six base32 chars instead of seven.
Those approaches would both rollover at 2106, breaking sorting, and you'd potentially start getting collisions around 2566. With his approach he is starting with a 63-bit timestamp (Go uses signed int64s), then math.big.Bytes is dropping leading 0 bytes, meaning it will add more characters as needed.
[+] [-] wahern|3 years ago|reply
[+] [-] cakoose|3 years ago|reply
But I wouldn't say "no wasted bits" unless 120 bits is also exactly the right length for a particular use case. For example, if my use case would be served better by 64 bits, I'd prefer losing some bits to padding than extending it to 120 bits.
[+] [-] kijin|3 years ago|reply
[+] [-] junon|3 years ago|reply
[+] [-] jamesfisher|3 years ago|reply
[+] [-] KMag|3 years ago|reply
So, you don't need to remove ambiguities if you can collapse the ambiguities into equivalence classes.
If you're going to use some flavor of base32, crockford32 is hard to argue against. I could bikeshed some alternatives (I would use the letter U and include Q in the 0 equivalence class, and not worry too much about obscenities), but of the somewhat common base32 standards, crockford32 is the best, IMNSHO.
[0] https://www.crockford.com/base32.html
[+] [-] dahfizz|3 years ago|reply
But, needing users to read / type a base32 ID is always bad UX anyway.
[+] [-] iainmerrick|3 years ago|reply
[+] [-] kortex|3 years ago|reply
[+] [-] dahfizz|3 years ago|reply
[+] [-] zlib|3 years ago|reply
[+] [-] vore|3 years ago|reply
[+] [-] hoherd|3 years ago|reply
> Hashids is a small open-source library that generates short, unique, non-sequential ids from numbers.
vs the summary in the article:
> So now when I list atoms from my filesystem or in S3, they come back in the same order that I wrote them.
[+] [-] tokumei|3 years ago|reply
[+] [-] stavros|3 years ago|reply
[0] https://github.com/skorokithakis/shortuuid
[+] [-] bartwe|3 years ago|reply
[+] [-] jminter|3 years ago|reply
To me, "math/big" feels like a big hammer for working on an int64; a nice alternative is "encoding/binary":
[+] [-] pavon|3 years ago|reply
Right now the time is converted to 4 bytes which then uses 7 base32 characters to encode. When the time rolls over to 33 bits in 2106, it will increase to 5 bytes encoded as 8 base32 chars, even though 7 base32 chars have enough bits (35) to represent the number.
So perhaps a better way would be something like:
This will stick with 7 base32 chars until you exceed 35 bits in 3058.[+] [-] throwaway81523|3 years ago|reply
[+] [-] pavon|3 years ago|reply
[+] [-] foota|3 years ago|reply
[+] [-] oittaa|3 years ago|reply
[0]: https://github.com/ietf-wg-uuidrev/rfc4122bis