top | item 38827068

Ask HN: How many times will this loop run on average?

1 points| aalhour | 2 years ago

A friend of mine shared the following code snippet with me and asked me to guess how many times the loop will run on average:

  for(int i = 0; i < Random(1, 100); i++);
I tried to guess the answer analytically and gussed ~50 but the empirical test was surprising to me. Can someone explain why the average is around 12?

Runnable code: https://replit.com/@aalhour/RandomLoop#main.py

EDIT: Formatting.

11 comments

order
[+] Someone|2 years ago|reply
In C (and your Python conversion), the i < Random(1, 100) part is evaluated each time through the loop, not once at start of the loop to determine the limit.

So, it’s 1% that the loop ends at i = 1, if it takes that hurdle 2% that it ends at i = 2, if it takes that hurdle 3% that it ends at i = 3, etc.

The calculation is easier if you phrase that this way:

It’s 99% that the loop continues at i = 1, if it takes that hurdle 98% of the rest that it continues at i = 2, if it takes that hurdle 97% that it continues at i = 3, etc.

So, the probability to make it past i = n is

  0,99 × 0,98 × 0,97 × … × (1 - n/100)
Note that this isn’t necessarily the case in all languages. Pascal, for example, has a real for loop where the limit is evaluated once and the index variable cannot be changed inside the loop, so that the compiler can determine the number of iterations before starting the first iteration.
[+] aalhour|2 years ago|reply
I was aware that the Random function call gets evaluated in Python (and C) every time the loop iterates, I just couldn't imagine the probability distribution myself, I had assumed that all numbers are uniformally distributed but didn't cater for the sums. You're a legend! Now I see it. I wrote the following code to check out your argument and it checks out indeed :thumbs-up:

  def p(n):
    total = 1
    for i in range(1, n):
      total *= (100-i)/100
    return total


  for i in range(1, 100):
    print(f"p({i}) = {p(i)}")

The first 13 results:

  p(1) = 1
  p(2) = 0.99
  p(3) = 0.9702
  p(4) = 0.9410939999999999
  p(5) = 0.9034502399999998
  p(6) = 0.8582777279999998
  p(7) = 0.8067810643199997
  p(8) = 0.7503063898175998
  p(9) = 0.6902818786321918
  p(10) = 0.6281565095552946
  p(11) = 0.5653408585997651
  p(12) = 0.503153364153791
  p(13) = 0.44277496045533604
p(12) sits at 50%, of course it will be the mean! :D
[+] hardlianotion|2 years ago|reply
Expected index given by

E [X] = \sum_{i \in [2,100)} p(X < i)\prod_{j < i}p (X \ge j)i.

Edit: Sorry rest of reply was wrong. Had to account for not hitting until i^th loop.

[+] aalhour|2 years ago|reply
Not sure that adds up, I translated the right side of the equality into a Python statement and it returns a number I am not sure how to interpret:

  1/99 * sum([(i-1)*i for i in range(2, 100)])   # 3266.666666666667
Both the median and mean are around 12 for 10K runs of the loop.
[+] orbz|2 years ago|reply
You’re calculating a random number every time the loop runs. If you want an average of 50 you should call random outside the loop and save the value to be compared each time.
[+] aalhour|2 years ago|reply
Yes, that's true. I don't want an average of 50. My initial guess of 50 is wrong. I am trying to understand why the average turned out to be approximately 12.
[+] haltist|2 years ago|reply
The only random variable in that code is "Random(1, 100)" and its expected value is 50.