top | item 25596687

(no title)

jfrisby | 5 years ago

I think you're crossing the streams a bit here.

Twitter designed _Snow_flake. _Sony_flake is a reimplementation that changes the allocation of bits, and the author acknowledges that the maximum ID generation throughput ceiling is lower than that of Snowflake.

Snowflake uses a millisecond-precision timestamp, and had a 12-bit sequence number. So 4,096,000 IDs per node-second. At that point, the bottleneck won't be the format of the IDs but the performance of the code, and IPC mechanism. Which, in this case, is non-trivial since Snowflake uses a socket-based approach to communication. Lower-overhead, native IPC mechanisms are certainly possible via JNI but would probably take some doing to implement. For Sonyflake, I don't imagine the socket overhead is all that much of an issue given the low throughput it's capable of with its bit allocations.

Were I to design something like this again[1], I might start with something like Sonyflake (the self-assignment of host ID w/out needing to coordinate via Zookeeper is nice), shave a couple bits from the top of the timestamp, maybe a couple from the top of the host ID, and pack the remainder at the top, leaving a few zero bits at the bottom. That would essentially mean the value returned was the start of a _range_, and anything needing to generate IDs in large quantities can keep its own in-process counter. Only one API call per N IDs generated by a given thread/process. And, of course, unix-domain socket or other lower-overhead approach to communication for when the API calls are needed.

[1] - A decade prior to Snowflake's release, I wound up taking a very similar approach at one of my first startups, albeit much more crude/inelegant and without nice properties like k-ordering.

discuss

No comments yet.