top | item 46227501

(no title)

Syzygies | 2 months ago

When I was a senior at Swarthmore College, Herb Wilf came over from U Penn to teach a course in combinatorial algorithms. I was encouraged to attend.

He claimed that choosing a subset of k integers at random from {1..n} should have a log in its complexity, because one needs to sort to detect duplicates. I realized that if one divided [1..n] into k bins, one could detect duplicates within each bin, for a linear algorithm. I chose bubble sort because the average occupancy was 1, so bubble sort gave the best constant.

I described this algorithm to him around 5pm, end of his office hours as he was facing horrendous traffic home. I looked like George Harrison post-Beatles, and probably smelled of pot smoke. Understandably, he didn't recognize a future mathematician.

Around 10pm the dorm hall phone rang, one of my professors relaying an apology for brushing me off. He got it, and credited me with many ideas in the next edition of his book.

Of course, I eventually found all of this and more in Knuth's books. I was disillusioned, imagining that adults read everything. Later I came to understand that this was unrealistic.

discuss

refibrillator|2 months ago

Love anecdotes like this! But admittedly I feel a bit lost, so please forgive my ignorance when I ask: why does choosing a subset of k integers at random require deduplication? My naive intuition is that sampling without replacement can be done in linear time (hash table to track chosen elements?). I’m probably not understanding the problem formulation here.

willvarfar|2 months ago

your random number function might return the same number multiple times? So to choose k random but unique numbers you may have to call the random number function more than k times?

Of course my intuition would be that you can do a random shuffle and then take the first k, which is O(N). So I might be misunderstanding.

itemize123|2 months ago

interesting, when is this? because that seems obvious today - basically hash-set, right?

Syzygies|2 months ago

1977. And I didn't know what a hash table was, though I can't explain now why they didn't think of using a hash table. I was effectively using a dumb hash function.

Their 1978 second edition works in exactly the memory needed to store the answer, by simulating my algorithm in a first pass but only saving the occupancy counts.

Oh, and thanks (I guess). I really didn't expect to ever be reading FORTRAN code again. One learned to program at Swarthmore that year by punching cards, crashing our IBM 1130, and bringing the printout to my supervisor shift. I'd find the square brackets and explain how you'd overwritten your array. I even helped an economics professor Frederic Pryor (the grad student in the "Bridge of Spies" cold war spy swap) this way, when I made an ill-advised visit to the computer center on a Saturday night. Apparently I could still find square brackets.