birthday paradox

By Jochen Voss, on 2010-09-15

I am slightly embarrassed by the fact that I've been caught out by the birthday paradox yesterday. The encounter went as follows:

While testing a new random number generator by analysing a list of 1000 generated standard normal distributed random numbers, I discovered that the list contained one of the numbers twice!!! This is suspicious, because this event has probability 0 in theory. After an (unsuccessful) hunt for bugs in my program, I finally found the following explanation.

The program prints the numbers using a C command like

printf("%f\n", normal(0, 1));

By default, the %f format string outputs numbers with a precision of six significant digits:

Most of the numbers will lay between -2 and 2, i.e. they are concentrated in a set of about 4 million possible values. A quick check reveals that 1000 independent uniform draws out of a set of this size contains a number twice with a probability of more than 10%, and for the normal distribution the probability will be even higher because the numbers are more concentrated around 0. Thus, seeing a number twice is something which will actually happen from time to time and is no indication that the program is malfunctioning!