top | item 43929006

(no title)

owyn | 9 months ago

Yeah, this was a google interview question for me too. I didn't know the algorithm and floundered around trying to solve the problem. I came up with the 1/n and k/n selection strategy but still didn't get the job lol. I think the guy who interviewed me was just killing time until lunch.

I like the visualizations in this article, really good explanation.

discuss

dekhn|9 months ago

I didn't know about the algorithm until after I got hired there. It's actually really useful in a number of contexts, but my favorite was using it to find optimal split points for sharding lexicographically sorted string keys for mapping. Often you will have a sorted table, but the underlying distribution of keys isn't known, so uniform sharding will often cause imbalances where some mappers end up doing far more work than others. I don't know if there is a convenient open source class to do this.

wood_spirit|9 months ago

Interesting idea, hadn’t that about that way to apply it.

I knew it from before my interview from a turbo pascal program I had seen that sampled dat tape backups of patient records from a hospital system. These samples were used for studies. That was a textbook example of it’s utility.