Collecting a Complete Set: From Baseball Cards to Disease and Denial of Service

The math behind probability for different outcomes.

ByABC News
September 29, 2011, 6:53 PM

Oct. 2, 2011— -- Rolling a die until all six numbers turn up, having children until you have at least one of each sex, and collecting the complete set of baseball cards are three examples of a certain sort of multi-part task. To complete such a task, it's necessary to perform several different sub-tasks. If the probabilities of performing these various subtasks are the same, some relatively simple math helps figure out how long or extensive the complete task will be.

To illustrate, let's start with a die. How many times will you have to roll it on average until each of the six numbers appears at least once?

The Math of Collecting the Complete Set

A couple of ideas are needed to answer this question, the first quite plausible.

Assume that the probability of some event or outcome of interest is a number p and that repeated tries to obtain the outcome have no effect on the probability of later tries.

Then it can be proved that on average we'll need 1/p tries to obtain the outcome.

This may sound complicated, so let's return to the die example. Since rolling a die randomly generates the numbers 1 through 6, the probability of obtaining a particular number, say 2, is 1/6, so on average we'll need 1/(1/6) or 6 tries to generate a 2. On average we'll also need 6 tries to generate a 5, and so on.

By contrast, the probability of rolling an odd number - either 1, 3, or 5 - is 3/6, so, on average, we'll need 1/(3/6) or 2 rolls of the die to roll an odd number.

The second idea builds on the first. The first roll of a die necessarily gives us a number not rolled before. After we've rolled this first number (whatever it is), the probability we roll a different number is 5/6 since we want to roll one of the 5 numbers we haven't yet rolled.

The average number of additional rolls until a second number turns up is thus 1/(5/6) or 6/5. After rolling two different numbers, the probability we roll a third number different from the first two is 4/6, and so the average number of additional rolls until it turns up is 1/(4/6) or 6/4. Continuing in this way until all 6 numbers are rolled and adding the results gives us (1+6/5+6/4+6/3+6/2+6/1) or 14.7 rolls on average to obtain all six numbers. Try it a few times.

Other Applications

The same basic calculation answers the other questions mentioned.

A couple plans to continue having children until they have a child of each sex and wonders how many children they'll likely have.

Or a baseball card collector wonders how many pieces of gum he'll probably have to buy to obtain the complete set of 400 cards.

The answer for the average number of children is 3. The first child will necessarily be of a sex not obtained before.

After the first child, the probability of a child of the opposite sex is 1/2, so 1/(1/2) or 2 additional children will be needed on average to obtain a complete set. (3= 1+1/(1/2) = 1+2).

Likewise, the average number of baseball cards needed to obtain the complete set is 1+1/(399/400) + 1/(398/400) + 1/(397/400) + ... + 1/(2/400) + 1(1/400), which equals more than 2,000.

The last term of 1/(1/400) or 400 additional cards reflects the difficulty of getting that last missing card, probably an obscure utility infielder, to complete your collection.

Further Afield

If we relax the assumption of equal probability for different outcomes, we get more practical applications. One odd example arises in so-called denial of service attacks in a computer network.