top | item 37733565

(no title)

Yes but if that machine with sequential data receives 100x the traffic of other machines, it can be worse than splitting this traffic evenly across all available machines.

discuss

kijin|2 years ago

If your database simply shards keys sequentially, it's going to get hotspots in a lot of use cases, like plain old integer keys and timestamps, not just UUIDv7. In that case it would be fair to say that your database is doing it wrong.

Fortunately, there's no rule that says you should shard your keys using the sequential part up front.

One of the rules for generating randomness from environmental sources is to throw away the high bits and only use the low bits. Distributed databases should do the same if they want a good distribution.

johncolanduoni|2 years ago

What distributed databases shard on the low bits? How do they do something like a range query?

The closest I’ve ever heard of is sharding based on a hash (e.g. CockroachDB can do this on request[1]) but most distributed databases with strong consistency (Spanner descendants in particular) default to “doing it wrong”.

[1]: https://www.cockroachlabs.com/docs/stable/hash-sharded-index...

stepanhruda|2 years ago

As I understood it, a big part of the premise of the post was that they see sequential storage (either in db or cache layer) as desirable

paulddraper|2 years ago

It depends if you have a request covers a lot of sequential data, or if you have a lot of requests of sequential data.

stepanhruda|2 years ago

Correct, it speeds up latency in best case scenario, and falls over in worst case scenario. Randomly sharded keys give a more consistent performance.