By Jochen Voss, on
I am slightly embarrassed by the fact that I've been caught out by the birthday paradox yesterday. The encounter went as follows:
While testing a new random number generator by analysing a list of 1000 generated standard normal distributed random numbers, I discovered that the list contained one of the numbers twice!!! This is suspicious, because this event has probability 0 in theory. After an (unsuccessful) hunt for bugs in my program, I finally found the following explanation.
The program prints the numbers using a C command like
printf("%f\n", normal(0, 1));
By default, the %f
format string outputs numbers with a
precision of six significant digits:
-0.641062 1.116142 1.417036 0.337435 -0.310383 ...
Most of the numbers will lay between -2 and 2, i.e. they are concentrated in a set of about 4 million possible values. A quick check reveals that 1000 independent uniform draws out of a set of this size contains a number twice with a probability of more than 10, and for the normal distribution the probability will be even higher because the numbers are more concentrated around 0. Thus, seeing a number twice is something which will actually happen from time to time and is no indication that the program is malfunctioning!
This is an excerpt from Jochen's blog.
Newer entry: Wisent version 0.6.1 released
Older entry: scary phone message
Copyright © 2010 Jochen Voss. All content on this website (including text, pictures, and any other original works), unless otherwise noted, is licensed under the CC BY-SA 4.0 license.