top | item 28296507

(no title)

dragonfax | 4 years ago

KKV databases (Cassandra and DynamoDB are good examples) have a common problem with hotspots or "hot partitions". The most common mistake is to use a timestamp of any kind in the range (cluster) column. Then, whatever partition represents "today" or "this hour" ends up being the hot partition.

The article mentions hot partitions becomming a problem with max partition size, but they're also a problem with scalability. Say, if your writing a very high throughput of logs into the table (contrived example), then your bottlenecked by the rate at which you can write to one partition.

Adding the bucket id (say, the current day or hour), is a common solution, and solves the max partition size issue, but not the scalability issue of hot partitions.

discuss

order

geenat|4 years ago

Cockroach DB recently addressed hotspots on sequence/timestamp workloads with: https://www.cockroachlabs.com/blog/hash-sharded-indexes-unlo...

Does what it says on the tin for the primary key.

That said, hotspots are 100% the reason why Cockroach encourages UUID primary keys. The disadvantage to UUID is you want sequential data, you then need a secondary index which you'll have to bucket anyway.