Collecting a Complete Set: From Baseball Cards to Disease and Denial of Service

The math behind probability for different outcomes.

Oct. 2, 2011— -- Rolling a die until all six numbers turn up, having children until you have at least one of each sex, and collecting the complete set of baseball cards are three examples of a certain sort of multi-part task. To complete such a task, it's necessary to perform several different sub-tasks. If the probabilities of performing these various subtasks are the same, some relatively simple math helps figure out how long or extensive the complete task will be.

To illustrate, let's start with a die. How many times will you have to roll it on average until each of the six numbers appears at least once?

The Math of Collecting the Complete Set

A couple of ideas are needed to answer this question, the first quite plausible.

Assume that the probability of some event or outcome of interest is a number p and that repeated tries to obtain the outcome have no effect on the probability of later tries.

Then it can be proved that on average we'll need 1/p tries to obtain the outcome.

This may sound complicated, so let's return to the die example. Since rolling a die randomly generates the numbers 1 through 6, the probability of obtaining a particular number, say 2, is 1/6, so on average we'll need 1/(1/6) or 6 tries to generate a 2. On average we'll also need 6 tries to generate a 5, and so on.

By contrast, the probability of rolling an odd number - either 1, 3, or 5 - is 3/6, so, on average, we'll need 1/(3/6) or 2 rolls of the die to roll an odd number.

The second idea builds on the first. The first roll of a die necessarily gives us a number not rolled before. After we've rolled this first number (whatever it is), the probability we roll a different number is 5/6 since we want to roll one of the 5 numbers we haven't yet rolled.

The average number of additional rolls until a second number turns up is thus 1/(5/6) or 6/5. After rolling two different numbers, the probability we roll a third number different from the first two is 4/6, and so the average number of additional rolls until it turns up is 1/(4/6) or 6/4. Continuing in this way until all 6 numbers are rolled and adding the results gives us (1+6/5+6/4+6/3+6/2+6/1) or 14.7 rolls on average to obtain all six numbers. Try it a few times.

Other Applications

The same basic calculation answers the other questions mentioned.

A couple plans to continue having children until they have a child of each sex and wonders how many children they'll likely have.

Or a baseball card collector wonders how many pieces of gum he'll probably have to buy to obtain the complete set of 400 cards.

The answer for the average number of children is 3. The first child will necessarily be of a sex not obtained before.

After the first child, the probability of a child of the opposite sex is 1/2, so 1/(1/2) or 2 additional children will be needed on average to obtain a complete set. (3= 1+1/(1/2) = 1+2).

Likewise, the average number of baseball cards needed to obtain the complete set is 1+1/(399/400) + 1/(398/400) + 1/(397/400) + ... + 1/(2/400) + 1(1/400), which equals more than 2,000.

The last term of 1/(1/400) or 400 additional cards reflects the difficulty of getting that last missing card, probably an obscure utility infielder, to complete your collection.

Further Afield

If we relax the assumption of equal probability for different outcomes, we get more practical applications. One odd example arises in so-called denial of service attacks in a computer network.

In these an attacker repeatedly sends packets of partial information (parts of a file, say) to a site in order to flood it and make it unavailable.

The site reassembles these randomly arriving packets into a complete file when at least one of each of the packets is received.

How long it takes a complete file (or many complete files) to travel from the attacker to the destination site can be computed using roughly the same mathematics as that used in the examples above.

The math can also help the victim locate the attacker.

If each of the routers along the path the information packets take marks them electronically, then how long it takes the destination site to receive the complete set of marked packets can be calculated and the attacker's location sometimes inferred from the marked path taken.

Another possible example is that of a disease that requires at least one of a number of different independent, continually occurring assaults on the body. Once these have all been collected, as it were, the disease develops.

Dice, children, baseball cards, denial of service attacks, disease together testify to the imperialist nature of mathematics. It sends out colonies to almost disciplines and endeavors.

John Allen Paulos, a professor of mathematics at Temple University in Philadelphia, is the author of several best-selling books, including "Innumeracy," "A Mathematician Reads the Newspaper" and "A Mathematician Plays the Stock Market." He's on Twitter and his "Who's Counting?" column on ABCNews.com appears regularly here.