top | item 22338755

Benford's Law

145 points| kjhughes | 6 years ago |en.wikipedia.org | reply

93 comments

order
[+] BenoitEssiambre|6 years ago|reply
Benford's law, Zipf's law and other power laws tend to fascinate but they are a manifestation of something really simple. It's how things behave when they are evenly distributed over multiplicative operations instead of additive operations. By this I mean things are as likely to be x% bigger than x% smaller instead of the more common definition of evenly distributed (things x unit larger are as likely as x unit smaller).

Being evenly distributed over multiplication is natural for many measures. I think it is considered the most even, "maximum entropy" distribution for things that can't be negative.

The most even, maximum entropy distribution for an unbounded (-infinity to +infinity) prior is an unbounded uniform distribution but you can't apply it, for example, in the case of the size of objects since it would imply that negative sizes are as likely as positive sizes (and objects don't have negative sizes).

In general position parameters tend to be uniformly distributed over addition/subtraction whereas scale parameters tend to be evenly distributed over multiplication (uniform over log(x)). It's the reason some graphs use log axis. For Bayesians, it's a fairly straightforwards concept. See also improper priors (https://en.wikipedia.org/wiki/Prior_probability#Improper_pri...)

[+] hyperbovine|6 years ago|reply
The max entropy distribution over (-oo,+oo) is usually taken to be either Gaussian or Laplace (two sided exponential), depending on moment conditions. There is no such thing as a uniform distribution over the real line.
[+] lonelappde|6 years ago|reply
As the article explains, benford's law applies to distributions over values that evolve over time with multiplicative fluctuations, not multiplicative distributions themselves. A simple uniform distribution over [0,n] obeys benford's law, averages over any range of N.
[+] ronenlh|6 years ago|reply
Quoted from The New York Times, Tuesday, August 4, 1998 http://www.rexswain.com/benford.html (linked in Wikipedia):

To illustrate Benford's Law, Dr. Mark J. Nigrini offered this example: "If we think of the Dow Jones stock average as 1,000, our first digit would be 1.

"To get to a Dow Jones average with a first digit of 2, the average must increase to 2,000, and getting from 1,000 to 2,000 is a 100 percent increase.

"Let's say that the Dow goes up at a rate of about 20 percent a year. That means that it would take five years to get from 1 to 2 as a first digit.

"But suppose we start with a first digit 5. It only requires a 20 percent increase to get from 5,000 to 6,000, and that is achieved in one year.

"When the Dow reaches 9,000, it takes only an 11 percent increase and just seven months to reach the 10,000 mark, which starts with the number 1. At that point you start over with the first digit a 1, once again. Once again, you must double the number -- 10,000 -- to 20,000 before reaching 2 as the first digit.

"As you can see, the number 1 predominates at every step of the progression, as it does in logarithmic sequences."

[+] Symmetry|6 years ago|reply
I used this, well the idea that numbers often tend to be distributed evenly in exponential space, in my MEng thesis. The idea was that because so often in computers our additions or other arithmetic operations involve at least one small number then most of the switching activity in an adder should be concentrated in the lower order bits.

So the plan was to make the low order bits of the adder slower and more power efficient. Make the high order bits faster and less power efficient. Meet the same overall clock rate target given by the worst case but use less power in the average case.

Sadly memory operations, which look a lot like pure entropy, were sufficiently common that the gains just weren't worth the design effort.

[+] tbrock|6 years ago|reply
This law actually led to Bernie Madoff being caught. I worked at a hedge fund at the time and we applied this to his returns and immediately knew it was a scam.
[+] skunkworker|6 years ago|reply
I’ve heard of this before, but part of me thinks also: How many people are getting away with fraud because they are utilizing Benfords law in order to make their data appear more normal? Or is there another attribute that can’t be accounted for when aiming for that?
[+] duxup|6 years ago|reply
Not to refute you but my understanding was that many people simply assumed something was wrong if only because of the consistency of returns he claimed were pretty much impossible.

IIRC at one point his numbers showed on a month to month basis losses only 4% of the time.

[+] m3kw9|6 years ago|reply
Can’t people use this law to defraud? I bet the more sophisticated ones already are
[+] dang|6 years ago|reply
[+] cjohnson318|6 years ago|reply
If anyone's interested, this is the breakdown of the leading digits of the number of comments for these threads:

{'1': 0.6896551724137931, '2': 0.13793103448275862, '3': 0.06896551724137931, '4': 0.06896551724137931, '5': 0.0, '6': 0.0, '7': 0.0, '8': 0.034482758620689655, '9': 0.0}

It's not exactly Benford's distribution. It doesn't span more than two orders of magnitude, but it's somewhat Benfordian, and it is what you might expect from a distribution of comments.

[+] Baeocystin|6 years ago|reply
Thanks for doing these on topics that cyclically come up. It's nice being able to see what has been previously said, in addition to seeing shifts in how topics are approached over time.
[+] knzhou|6 years ago|reply
Interestingly enough, it's also a decent fit for fundamental physical constants, as one can read off from a table of them. [0]

0: https://physics.nist.gov/cuu/pdf/all.pdf

[+] polyphonicist|6 years ago|reply
That's interesting indeed. Any idea why Physics constants obey Benford's Law? Did the universe pick the physical constants from an exponential distribution?
[+] eqmvii|6 years ago|reply
Spent some time in college studying the use of this to detect election fraud. A really intriguing concept, but then the question becomes: what next? Is the statistical data enough to trigger a deeper investigation?
[+] patrickthebold|6 years ago|reply
You are phrasing your question like it might be controversial. Why wouldn't 'statistical data' be enough to trigger a deeper investigation?
[+] lonelappde|6 years ago|reply
Where does benfords law apply in elections? Not in vote counts (unless the set of precinct sizes is largely fabricated) or insignificant digits of vote counts.
[+] logicallee|6 years ago|reply
I thought I could try this really easily in Python.

Here's what I did:

Open IDLE and type:

   from random import seed
   from random import random
   seed(1)
You can then try typing random() and get values:

   >>>random()
   0.13436424411240122

   >>>random()
   0.8474337369372327

   >>>random()
   0.763774618976614
I actually forgot that random() doesn't just return an integer, like in c where if you type rand() you get a random integer. But these numbers looked great to me. I could easily imagine the first one 0.13436424411240122 just being 13436424411240122 and so on. I thought it's a great way to get the first digit of a random numbers with an unlimited size, and for this result I thought this is a good way to go ahead.

So with the above basis, I want to look at the distribution of the first nonzero digit after the decimal point.

So next put the following function in. (basically it just gets a random as above, but keeps multiplying by 10 as long as the first digit of the string conversion is a "0". this returns the first non-zero digit at the left.)

    def getone():
       floaty = random()
       while(str(floaty)[0] == '0'):
          floaty *= 10
       return int(str(floaty)[0])
This now gets us the leftmost digit, check it out:

   >>>getone()
   3
   >>>getone()
   2
   >>>getone()
   4

   >>>getone()
   2
All right. Now let's just do this a few million times and keep track of buckets. Make some buckets:

   buckets = [0,0,0,0,0,0,0,0,0,0] # buckets for: 0,1,2,3,4,5,6,7,8,9
This next line will run for up to a few minutes:

   for tries in range(10000000):
      buckets[getone()] += 1
Now print the results:

   for result in range(10):
      print (result, ":", buckets[result])
This gets me....

   0 : 0
   1 : 1112927
   2 : 1110821
   3 : 1109741
   4 : 1110793
   5 : 1111604
   6 : 1112178
   7 : 1111272
   8 : 1110402
   9 : 1110262
Clearly Benford's law does NOT apply here.

Why not?

[+] susam|6 years ago|reply
Your experiments show that when we pick numbers from a uniform distribution, each digit from 1 to 9 is equally likely to be the leading significant digit. This is perfectly fine. Numbers picked from a uniform distribution indeed do not obey Benford's law.
[+] ggm|6 years ago|reply
If we counted in hex, how would it vary? It appears to me to be "not much" because it says most things scale to express as instances of the first non-zero value in the number system plus an arbitrary number of powers of the base. So.. it's Fermi arithmetic 0, 1, 10, 100 ...
[+] susam|6 years ago|reply
In base b, the probability of the leading significant digit to be d (where 0 < d < b) is log_b(1 + 1 / d).

That's why for base 10, the probability of the leading significant digit to be 1 is log_10(1 + 1/1) = 0.301. In base 2, the probability of the leading significant digit to be 1 is, quite trivially, log_2(1 + 1/1) = 1.0. Of course, all numbers in binary must begin with the digit 1.

[+] chess93|6 years ago|reply
Benford's law works in any base.
[+] m3kw9|6 years ago|reply
The law definitely has a human psychology to it. When stock prices are close to say 1000, it will like gets pushed over to the psychological level.
[+] pugworthy|6 years ago|reply
There are a number of fun random generators online - worth googling.
[+] susam|6 years ago|reply
Experimental evidence for uniform distribution:

  # uniform.py

  import random

  n = 10**4
  samples = 10**6
  counts = [0] * 10

  for i in range(samples):
      random_number = random.randint(1, n)
      first_digit = int(str(random_number)[0])
      counts[first_digit] += 1

  for i, count in enumerate(counts):
      print('{} => {:.3f}'.format(i, counts[i] / samples))
Output:

  $ python3 uniform.py
  0 => 0.000
  1 => 0.111
  2 => 0.111
  3 => 0.111
  4 => 0.111
  5 => 0.111
  6 => 0.111
  7 => 0.111
  8 => 0.111
  9 => 0.111
Experimental evidence for exponential distribution:

  # exponential.py

  import random

  n = 10**4
  samples = 10**6
  counts = [0] * 10

  for i in range(samples):
      random_number = 2 ** random.randint(1, n)
      first_digit = int(str(random_number)[0])
      counts[first_digit] += 1

  for i, count in enumerate(counts):
      print('{} => {:.3f}'.format(i, counts[i] / samples))
Output:

  $ python3 exponential.py
  0 => 0.000
  1 => 0.301
  2 => 0.176
  3 => 0.125
  4 => 0.097
  5 => 0.079
  6 => 0.067
  7 => 0.058
  8 => 0.051
  9 => 0.046
[+] dmurray|6 years ago|reply
You chose N in such a way to make your hypothesis true.