Nano ID: A tiny, secure URL-friendly unique string ID generator for JavaScript

[+] bmn__|8 years ago|reply

For years already I've been using UUID v4 (generated from a good random source) then base64url encoded, resulting in a 22 characters/bytes long identifier.

The nanoid implementation does not really bring anything new to the table except having fewer SLOC (which I personally don't find important). It is also incompatible with https://tools.ietf.org/html/rfc4648#section-5 for no good reason. It is noticable again and again that devs publishing to npm are not up to snuff, I wish I knew why, as this does not happen with other dynamic languages/their code repos.

[+] feedjoelpie|8 years ago|reply

> It is noticable again and again that devs publishing to npm are not up to snuff, I wish I knew why, as this does not happen with other dynamic languages/their code repos.

It absolutely does happen with other languages. It just happens in Node with greater volume because of the greater volume of programmers, low barriers to entry for JavaScript, lack of a rich standard library, the convergence of server focused coders and frontend coders, and lack of consensus around best practices and frameworks for applications.

Kind of the way it's been possible to write decent code in PHP since version 5. Yet the overall ecosystem is still mostly bad.

Lots of need to fashion very basic tools + high volume of inexperience + democracy = A package repo full of nonsense where it's difficult to find the diamonds in the rough.

[+] discreteevent|8 years ago|reply

Agreed. It's usually better to use a standard if there is one. To be specific its at:

https://tools.ietf.org/html/rfc4648#page-7

Note that it excludes the use of ~ because it has special meaning in some file systems. Nano ID includes ~. It's a good demonstration of why sometimes a standards comittee is better than a lone developer: A group of people and reviewers are more likely to think of everything.

[+] dchest|8 years ago|reply

Wait, why would you base64-encode UUID? You can encode 16 random bytes and not have a few bits spoiled by UUID version number.

But then, there are people who'd like to use 62-character alphabet, avoiding any special characters. This package allows them do generate such strings.

> devs publishing to npm are not up to snuff

Indeed, there are tons of packages that use Math.random or take modulo without accounting for bias. This is why I'm glad there are new packages like nanoid appearing, created by people who know what they are doing, and this is why your negativity is unfounded.

[+] watty|8 years ago|reply

Good points but this comment reeks of arrogance. It absolutely does occur in other languages, why would you assume it doesn't?

[+] sigi45|8 years ago|reply

I would still use uuid v4. Databases support this data type and reduce its storage needs. You can also compact it if it is too long for your use case or you like to save traffic.

[+] zepolen|8 years ago|reply

But that goes against the modern principle of reimplement everything.

[+] falsedan|8 years ago|reply

> `random % alphabet` is a popular mistake to make when coding an ID generator. The spread will not be even; there will be a lower chance for some symbols to appear compared to others—so it will reduce the number of tries when brute-forcing.

I don't understand: is the default implementation of random in node.js flawed? Surely it should produce a uniform distribution…

edit nevermind, found the link to https://gist.github.com/joepie91/7105003c3b26e65efcea63f3db8... just above that paragraph

[+] jimktrains2|8 years ago|reply

0,1,2,3,4 % 3

0 => 0

1 => 1

2 => 2

3 => 0

4 => 1

So, a uniform input will produce more 0s and 1s than it will 2s.

[+] pfooti|8 years ago|reply

I really like the ulid library. Does what I want, compact representation, and lexicographically sorting ulids in time order is great. The character subset is URL friendly, so this is my go-to when I'm making nonces if I am not worried about super-double-secure prngs.

https://github.com/alizain/ulid

[+] jmull|8 years ago|reply

I like that this generates the ID directly by choosing random symbols.

But I don't like that it's equivalent to UUID v4 but different. The difference doesn't buy anything significant over the standard.

With a pretty small change this could directly generate base64 encoded UUID v4 (without going through a two-step process of generating a UUID and then encoding it: Most characters could be generated randomly from the base64 alphabet. The characters at certain indexes would need to use a restricted set of base64 characters to set certain bits to the values specified by UUID v4.

[+] Confiks|8 years ago|reply

For a toy project, I'm currently using a 64-bit integer of which the first bit is a sign bit (I'm using Postgres' bigint), the next 32-bits make a UNIX epoch timestamp in seconds, and the rest of the 31 bits make a random number. In this way, the ID is very easily sortable by time, although it's a bit of a gimmick.

The scheme allows for 2 billion possible records per second, although if you consider birthday collisions, it comes down to a lot less (but I could re-insert on duplicates). I could later also use some of the random bits to inscribe extra data into the ID.

The bits from the two halves are then interspersed to make sure the first half looks random, and transformed to a 11-character id, for example /^[0-9A-Za-z]{11}$/. I can reject some IDs beforehand if they match a list of bad strings. The result is receiving an id such as "Vxe61yU5Cci" or "WgOUBEO88gb".

[+] mort96|8 years ago|reply

What happens in 2038? I hope you're never representing those 32 bits of UNIX timestamp as a signed integer anywhere. If you're making sure to always keep it unsigned, what happens in year 2106?

EDIT: I'm not criticizing you, and a valid counter argument would be that you make absolutely sure to always store the timestamp unsigned, that it doesn't need to work until 2106 because it's a toy project, or that an overflow wouldn't really matter because you make sure to keep it unique even after an overflow, but I get scared any time storing a UNIX timestamp in 32 bits is mentioned, and am wondering if you've considered it.

[+] unknown|8 years ago|reply

[deleted]

[+] maxpert|8 years ago|reply

Wait so `%` is bad but `& 63` is fine? Ain't you doing a mod as well??? It's just that you have power of two (64 in this case) that you are trying to exploit here... anyone who knows basics of computer science can tell you it's same! If it's about reducing code size I can demonstrate smaller code:

module.exports = function (size) {

  return random(size || 22)
           .map(r => url[r & 63])
           .join('');

};

I am always disappointed to see these kind of packages, as mentioned somewhere in comments above

> It is noticable again and again that devs publishing to npm are not up to snuff, I wish I knew why, as this does not happen with other dynamic languages/their code repos.

And I totally agree!

[+] dchest|8 years ago|reply

Check the code below:

      var masks = [15, 31, 63, 127, 255]
      ...
      var mask = masks.find(function (i) {
          return i >= alphabet.length - 1
      })
      ...

      var byte = bytes[i] & mask
      if (alphabet[byte]) {
        id += alphabet[byte]

Masking results in a random uniform number that is a power of two and greater or equal than the alphabet length and then indexes that are greater than alphabet length are rejected, which is a common way of avoiding modulo bias.

I am always disappointed to see these kind of packages, as mentioned somewhere in comments above

Because you didn't understand how it worked and assumed that it's bad?

[+] unknown|8 years ago|reply

[deleted]

[+] dchest|8 years ago|reply

My version: https://gist.github.com/dchest/751fd00ee417c947c252

[+] dchest|8 years ago|reply

(replying to myself as I can no longer edit:) My favorite feature in it is .entropy(n) method: when you want some fixed amount of unpredictable bits, but don't want to calculate how long the random string should be in your encoding to contain that many bits.

[+] unknown|8 years ago|reply

[deleted]

[+] danielbankhead|8 years ago|reply

I designed something similar (in terms of URL-friendliness), but for distributed systems:

https://github.com/AltusAero/bronze

It's designed for use cases where collision-resistance (both time-based [UUID1] and random [UUID4]) is valued more than the size of the id. Can be used as a module (Node.js & browser) and via CLI.

[+] bernadus_edwin|8 years ago|reply

Silly question, why js not provide method generate uuid without library? I mean this is 2017. One method without param input, wont hurt anybody

[+] dchest|8 years ago|reply

UUID is a misguided standard that should have been killed long ago if not for compatibility reasons, so I'm glad it's not in JS.

[+] StavrosK|8 years ago|reply

I wrote a Python version of this six years ago, it's been working beautifully and I use it for IDs in all my APIs:

https://github.com/skorokithakis/shortuuid

That way, elements are neither enumerable nor guessable.

[+] duke360|8 years ago|reply

tilde (~) doesn't looks a wise choice to me as may have special meaning in certain context, fortunately the alphabet are redefinible. i would also prefer if at least some part fo the nanoid would be machine dependent. anyway, it works :)

[+] bluefox|8 years ago|reply

If you need a function that returns integers in some range [0..n) equiprobably, just write that. Why conflate a bunch of concepts and have weird limitations?

Oh, and by the way, fuck HN for deleting comments.

[+] dfreire|8 years ago|reply

You can also check https://github.com/dylang/shortid

[+] arnioxux|8 years ago|reply

This is a perfect example of what NOT to do. They reimplemented a seeded linear congruential generator for their random number generator:

https://github.com/dylang/shortid/blob/master/lib/random/ran...

But since LCGs are easy to solve, after seeing a few values you can solve for the seed and generate all past and future values.

But you don't even need to do anything fancy for that. Since their implementation of LCG only has a state size of 233280 different values, you can just brute force it. (also means that their rng could only ever generate 233280 different numbers to begin with)

Why the fuck do they have 2.5k stars!?

EDIT: They already have an open issue for it but project seems unmaintained: https://github.com/dylang/shortid/issues/70. Stay far far away.

[+] loop0|8 years ago|reply

How does it compares to hashids (http://hashids.org/)?

[+] qilo|8 years ago|reply

Hashids are reversible mappings to/from integers. Nano IDs are random strings.

[+] magnat|8 years ago|reply

String generated using default parameters has ~0.1% chance of containing four letter english profanity.

[+] ludicast|8 years ago|reply

I'd consider that a feature not a bug :).

[+] ijustdontcare|8 years ago|reply

what about uuid, a well researched and established standard?

[+] veeti|8 years ago|reply

> Compact. It uses more symbols than UUID (A-Za-z0-9_~) and has the same number of unique options in just 22 symbols instead of 36.

[+] gvx|8 years ago|reply

This is basically UUIDv4, encoded with a url-safe variant of base64 (except UUIDv4 reserves a few bits for saying "this is a UUID, specifically version 4").

[+] namelost|8 years ago|reply

UUIDs are not secure unless they are version 4 (random) and generated from a secure randomness source.

[+] Kiro|8 years ago|reply

Why wouldn't this be a one-liner?

[+] dchest|8 years ago|reply

It would be a pretty long line, containing a function wrapping crypto.randomBytes, window.crypto.getRandomValues, window.msCrypto.getRandomValues selected at runtime, and then a few lines to select characters avoiding modulo bias. Try?

78 comments