If you have a million mint flips, it ’ s about certain that somewhere in those mint flips there will be 20 heads in a course. My web log post took the Times to task for printing what seemed to be an obviously bogus statement. Since a run of 20 heads is approximately a one-in-a-million occurence, a basic feel for probability should tell you that trying to do this a million times is not going to be a certainty – fairly far from it .

naturally, I thought to back up my argument with some hard facts, and I came up with a calculation that showed that the chances of this happening were actually about 60 %. unfortunately, I made a reasonably basic err in calculating the probability, as was demonstrated to me by correspondent Andy Langowitz. My intuition about the likelihood of success was on the money, but my calculate was a piece high gear .

The floor of how I corrected my error in order to get the discipline number, around 37.9 %, is a good moral in careful examination of probability rules, with a small detour into the land of the bignum.

#### Independent Probabilities

When trying to develop a convention like this one, it is much easier to work from the inverse. alternatively of calculating how likely it is for 20 heads to occur at any point in the succession of a million tosses, I thought to alternatively calculate the probability that it wouldn ’ thymine happen. We can then take that number and subtract it from one to get the desire result .

The prospect of *n* heads in a course occur is 1/2n, so the inverse probability is ( 2n-1 ) /2n. If we multiply that probability once for all 999,981 possible occurences of a streak of 20 heads, it seemed to me that I would be in occupation. Doing this is a simple adequate calculation, and the result was the 60 % figure. That figure felt like it was in the approximate range to me, and I left it that .

Mr. Langowitz, however, was smart adequate to actually test the theory on a smaller place of numbers. Let ’ s apply this hypothesis to find out how probably we are to throw two heads in a row in four tries .

The casual of two heads in a row is 1/4, so my recipe would give a resultant role of 1 – ( 3/4 ) 3, or a result of 37/64 – a little better than 50 % prospect of it occuring. But probability is nothing but the art of enumerating and count, and we can do equitable that to check the solution. The 16 evenly possible result of four tosses are :

```
HHHH HHHT HHTH HHTT
HTHH HTHT HTTH HTTT
THHH THHT THTH THTT
TTHH TTHT TTTH TTTT
```

Look through those outcomes and see how many have two consecutive heads. It turns out to be precisely 8, meaning that my calculation of 37/64 is good flat incorrect .

#### Locating the Mistake

My err is a reasonably common one among probability neophytes. I assumed that probability at each step in the sequence was identical, because the sequences were wholly freelancer. It ’ sulfur comfortable enough to think that if you don ’ triiodothyronine examine problem carefully .

What actually happens is that when we examine the possibility of an abortive footrace of heads at flip *i*, we slenderly bias the consequence at discard *i+1*. A demonstration will show precisely how this happens .

For the sample distribution problem of a sequence of two heads out of four tosses, we can first base examine the gamble of a negative result starting at chuck 1. There are equitable four possible outcomes that have two tosses starting at position 1 :

```
HH HT TH TT
```

And merely one of these tosses yielded two heads in a row, so the probability of not seeing two heads after two tosses is 3/4 .

But nowadays when we look at the sequence of tosses starting at position two, we have to throw out the outcomes where we had two heads at toss one – we ’ ve already seen two heads, so we can ’ t continue flipping coins in those outcomes. so our universe of possible outcomes is now a bite different :

```
HTH HTT
THH THT
TTH TTT
```

rather of eight outcomes, we have six. And if we look at the first pass seen in position two, alternatively of having an even distribution of heads and tails, you can see that sample distribution is biased : only two have a forefront in military position two, while four have tails. So the chances of not seeing two heads starting at placement two increases to 5/6. note that this variety in probability occurs because we have selected lone those outcomes without a streak of two heads at position one .

similarly, when we look at the possible outcomes for streaks starting at position three, we get a unlike probability again. Because we have to throw out one sequence in the former test, the population of possible outcomes is nowadays limited to :

```
HTHH HTHT
HTTH HTTT
THTH THTT
TTHH TTHT
TTTH TTTT
```

so nowadays we have just ten possible outcomes, and two of those will produce the hope result, meaning the probability has changed to 4/5 .

so what is the probability of all three possible positions not containing a streak ? That would be ( 3/4 ) ( 5/6 ) ( 4/5 ) which reduces nicely to 1/2, the correct suffice .

#### Finding the General Solution

so let ’ s generalize the interview at hand : what is the probability of seeing *k* consecutive heads when a fair coin is tossed *n* times ? The previous section showed that we can work it out by hand for small numbers of tosses, but it should be clear that if are going to toss a coin a million times, the population of potential outcomes is going to get unwieldy. We need a general description of the problem in order to solve it for any values of *k* and *n* .

For many probability problems, finding a solution is merely a way of figuring out how to count things, and coin tosses indeed appear to be good such a problem. Let ’ s try to see if we can count the number of times a sequence of *k* heads will appear at a given convulse .

To start to work out the solution to the problem, I will set *k* to a value of three – in other words, we will be trying to see what is the probability of seeing three straight heads is at toss *i*, given that there have not been three heads at an earlier pass. To calculate the probability, we need to know two things. first, we need to know all the possible outcomes in our universe of samples at convulse *i*. In the previous section, with a value of *k* =2, we saw the the number of outcomes for tosses 1, 2, 3, and 4, was 2, 4, 6, and 10 .

After determining the number of outcomes, we then need to determine how many of those outcomes were successes. If we defined success as being the issue of outcomes in which a *k* heads in a row appear at position *i*, the values from the previous part would have been 0, 1, 1, 2 .

#### Counting the Successes

I ’ ll get down with the more difficult problem : counting the numeral of times *k* heads appear at toss *i*. To start with, we have the pervert cases where *i* is less than *k*. In all of those tosses, we know that the number of successful outcomes is going to be zero, because there have not been enough tosses to achieve success yet .

If we work our direction forwards with the model of *k* =3, our first four tosses end up giving us three sets of outcomes :

```
H T
HH HT
TH TT
HHH HHT
HTH HTT
THH THT
TTH TTT
HHTH HHTT
HTHH HTHT
HTTH HTTT
THHH THHT
THTH THTT
TTHH TTHT
TTTH TTTT
```

bill that when we get to toss 3, there is just one successful result. Likewise, in discard 4, there is equitable one successful result .

It may not be immediately obvious, but we can in fact always tell how many successes we will achieve at flip *i+k* after we have enumerated all the possible outcomes at pass *i*. The number is defined as the number of outcomes at flip *i* that end in a tail.

The logic behind this is straightforward : in regulate to have a success at position *i+k*, we need to generate a succession of *k* heads, starting at flip *i+1*. If we have an result that presently ends in a dock, it will generate 2^ *k* outcomes in the future *k* tosses, and precisely one and entirely one of these will have *k* consecutive heads. none of these outcomes will result in an sequence of *k* heads before toss *i+k*, because they presently terminate in a tail, so all of the outcomes generated from that consequence at side *i* will be included in the outcomes seen at flip *i+k* .

similarly, none of the outcomes at position *i* that presently end in a head are going to be able to contribute to a success at flip *i+k*. Any streak of *k* heads that follows a displace head at pass *i* will result in a melt of *k* heads *before* we reach flip *i+k* .

Looking at our outcomes for *k* =3, we can see that at toss 1 we have one result ending in a tail, so at toss 4 we will have one achiever. At pass 2 we have two outcomes ending in a tail, so at flip five we will have two successes. And we have the particular shell of flip 0 – we have one sequence starting at convulse 0 that generates a sequence of *k* heads at flip *k*. Although there were no tails tossed at position 0, any sequence that starts there doesn ’ t have any preceding heads tosses either, so it is as if there was a single consequence at toss 0 with a value of tails .

so each flip that ends in a dock acts as the root of a successful result at a future position. This is good information, but in rate to turn this in to a formula we need to be able to compute the act of outcomes ending in tails at chuck *i* – we don ’ triiodothyronine want to have to enumerate all the outcomes in order to get there. I ’ ll mention to these limited outcomes as *anchors*, as they form the anchor of a future result .

#### Counting the Anchors

The number of anchor outcomes at each position starts out as a nice number while *i* is less than or adequate to *k* : 2^ *i*. But after flip *k*, successful consequence start being removed from the sample set and the formula no long holds. For *k* =3, the anchor count starting at toss1 is : 1, 2, 4, 7, 13, 24 .

It turns out that the anchor at position *i* does more than just generate a success at convulse *i+k*. It is besides responsible for generating new anchor outcomes at tosses *i+1*, *i+2*, …, *i+k-1* .

Looking at an example for *k=3* should clarify this. Our lone anchor consequence at flip 1 is the sequence `T`

. We know that this anchor will create a raw successful consequence at toss 4 : `THHH`

. But it besides creates new anchors at all intermediate tosses : `TT`

, `THT`

, and `THHT`

.

This observation holds true for the genesis of all fresh anchors, and with a little work we can turn this into a functional recurrence. If each anchor at chuck *i* is going to create a new anchor at tosses *i+1* through *i+k-1*, we can calculate the number of anchors at chuck *i* using this formula :

```
anchors(i)=anchors(i-1) + anchors(i-2)
```

**n-step Fibonacci**sequence adds the previous

*n*values in order to get the current value. ( For the rest of this article, the n-step Fibonacci number will be referred to as fibn ( one ). ) In the standard Fibonacci recurrence definition, we define a free-base value of fib ( 1 ) = 1, and fib ( x ) = 0 for all x less than 1. Our anchor count is skewed by one, since our basis measure at toss 0 is 1. As a leave, the the anchor count at chuck

*i*is equal to fibk ( i+1 ). And from our notice of the link between anchors and successful outcomes, we can then observe that the number of successful outcomes at toss

*i*is peer to fibk (

*i+1-k*) .

#### Counting the Outcomes

Knowing the number of successful outcomes at chuck*i*entirely gets us halfway to knowing the actual probability of seeing a sequence of

*k*heads at that point. To get the full probability, we need to know the phone number of outcomes equally well . fortunately, this calculation is closely trivial. In the previous section we saw that the count of anchors at discard

*i*is equal to fibk ( i+1 ). Each anchor at flip 1 and greater is simply a sequence of tosses that ends in a tail. And for every sequence ending in a tail, there is a represent sequence that is identical except for one key change : it ends in a head alternatively of a buttocks. So the act of outcomes at toss

*i*is twice the number of anchors, or 2 * fibk ( i+1 ) . nowadays that we have those numbers, we can finally crank out the probabilities of a streak of

*k*heads appearing at all possible tosses without having to painstakingly enumerate all those sequences of heads and tails. The digit downstairs shows some of the probabilities for tosses starting at 1 for streaks of 2, 3, and 4 heads. It is an excellent exert to walk through the enumeration process in order to double check the figures - I encourage you to give it a try .

#### The Absence of a Streak

immediately that we can compute the probability of seeing a streak of*k*heads at toss

*i*, we need to do equitable a bit more work to see the what the odds are of seeing that stripe at any meter in a sequence of

*n*tosses. To get there, we need one more while of data : the probability of

*not*seeing a streak of

*k*heads at flip

*i*- in other words, the chances of a minus consequence . It ’ sulfur pretty easy to calculate that act - simply subtract the number of successful outcomes from the total act of outcomes. The sample sequences for values of

*k*corresponding to 2, 3, and 4 are shown here :

#### Put it All Together

With all these formulas in hand, we have the tools to determine the final probability we ’ ve been working towards : the probability that a sequence of*k*heads will appear in a succession of

*n*tosses. To do this, we calculate the accumulative probability that the event does not occur, and subtract that prize from 1. The consequence will be the answer to the wonder . The inaugural equality below shows the general approach we take to this trouble - multiplying the probability of bankruptcy at each discard. Once we plug in the actual recipe for the failure count at each detail, and the convention for the sum act of outcomes, we see that most of the terms cancel out. We are left with fibk (

*n+2*) on top, and 2n on the bottom. And that is the final examination formula that provides an answer to the doubt . Returning to the original guess in the New York Times, all we need to do is calculate fib20 ( 1,000,002 ) and then divide it by 21,000,000 . unfortunately, my calculator is not in truth up to this. flush a background calculator that could handle arbitrary preciseness mathematics won ’ metric ton normally have a fibk release . If I had a copy of Mathematica, and knew how to use it, I think I could solve this with just a few lines of remark. But I don ’ deoxythymidine monophosphate, so I coded up a short solution in Java . Java has two classes that enable me to solve this problem in relatively easy fashion : java.math.BigInteger and java.math.BigDecimal. BigInteger performs bare integer calculations with arbitrary preciseness, and BigDecimal supports floating point mathematics . My simple app, contained in Flipper.java, uses BigInteger to calculate fibk ( n+2 ) and 2n, then uses BigDecimal to divide the two numbers and subtract the solution from 1. Despite fact that the two intercede results are over 300,000 digits each, the plan guide in a very fair amount of time, less than an hour. ( optimization of this course of study would be a very matter to use. ) The output of the broadcast is shown here :

```
fib(20,1000002) = 614579313398524367786474463596 (301000 digits elided)
2^1000000 = 99006562292958982506979236163 (301000 digits elided)
Div = 0.379253961388950068663971868 (999980 digits elided)
```

so at last, we know the compensate result. If you flip a coin a million times, you have a 38 % opportunity of seeing 20 heads in a course. A long way from the certainty claimed by the New York Times, and a bit off from my initial 60 % value .

#### Postscript

Working out the details of this problem was a very enjoyable objet d'art of mathematics. When I first started in on the problem, I hoped I would be able to find a reference point that just told me how to calculate the number, but I had no luck. As I worked through the trouble, I ran into the being of the n-step Fibonacci numbers, which I had never heard of. Once I found the address to them on the Wolfram Alpha page ( linked above ), I saw that the page had a crisp note describing this trouble, but with no details. With luck the adjacent person trying to understand this trouble will be able to make sense of it by reading this page .

## Leave a Comment