## Thursday, 28 February 2013

### What chance the next roll of the die is a 3?

In response to my posting yesterday a colleague posed the following question:
The die has rolled 3 3 3 3 3 3 3 in the past. What are the chances of 1 2 4 5 6 being rolled next? The mathematician will say: P(k)=1/6 for each number, forget that short-term evidence. What will the probability expert say? And the statistician? And the philosopher?
I have provided a detailed solution to this problem here.

In summary, it is based on a Bayesian network in which (except for the 'statistician') it all comes down to what priors they are assuming for the probability of each P(k).
• The mathematician's prior is that the probability of each P(k) is exactly 1/6.
•  One type of probability expert (including certain types of Bayesians) will argue that, in the absence of any prior knowledge of the die, the probability distribution for each P(k) is uniform over the interval 0-1 (meaning any value is just as likely as any other).
• Another probability expert (including most Bayesians) will argue that the prior should be based on dice they have previously seen. They believe most dice are essentially 'fair' but there could be biases due to either imperfections or deliberate tampering. Such an expert might therefore specify the prior distribution for P(k) to be a narrow bell curve centred on 1/6.
•  A philosopher might consider any of the above but might also reject the notion that 1,2,3,4,5,6 are the only outcomes possible.
Anyway, when we enter the evidence of seven 3's in 7 rolls, the Bayesian calculations (performed using AgenaRisk) result in an updated posterior distribution for each of the P(k)s.

The mathematician's posterior for each P(k) is unchanged: i.e. each P(k) is still 1/6.So there is still just a probability of 1/6 the next roll will be a 3.

For the probability expert with the uniform priors, the posterior for P(3) is now a distribution with mean 0.618. The other probabilities are all reduced accordingly to distributions with mean about 0.079. So in this case the probability of rolling a 3 next time is about 0.618 whereas each of the other numbers has a probability about 0.079

For the probability expert with the bell curve priors, the posterior for P(3) is now a distribution with mean 0.33. The other probabilities are all reduced accordingly to distributions with mean about 0.13. So in this case the probability of rolling a 3 next time is about 0.33 whereas each of the other numbers each has a probability about 0.13.

And what about the statistician? Well a classical statistician cannot give any prior distributions so the above approach does not work for him. What he might do is propose a 'null' hypothesis that the die is 'fair' and use the observed data to accept or reject this hypothesis at some arbitrary 'p-value' (he would reject the null hypothesis in this case at the standard p=0.01 value). But that does not provide much help in answering the question. He could try a straight frequency approach in which case the probability of a three is 1 (since we observed 7 out of 7 threes) and the probability of any other number is 0.

Anyway the detailed solution showing the model and results is here. The model itself - which will run in AgenaRisk is here.

1. Thomas Roelleke1 March 2013 at 03:25

Norman

Great explanation, beautiful illustration. Your essay shows that already seven 3's mean that more than school math and knowledge is required to differentiate the cases and assumptions. Is it then a surprise that combining more complex evidence in the real world is a job with high risk of error? Or that by changing the assumptions, we can obtain the result we want?

Mathematicians? It makes me smile that you view their world as simple; keep it at 1/6.

Probability experts? Of course, there are two possible outcomes. At least two! P(3)=0.618 or P(3)=0.33. The next step will be to define the priors of the experts (assumptions), and combine the probabilities, to get to the probability of the conclusion to be true: P(P(3)=0.5 | uniform,bell) = ...?
Btw, is there a Bayesian expert telling P(3 | seven occurrences but no occurrence of any other number) < 1/6?

Philosophers? Yes, after few minutes, they might start a discussion about other events and laws and possibilities, diverting away from the actual issue. Certainly very interesting, but does it help to say what is the value of P(3)?

Statisticians? You say they go for short-term evidence. Pushing it, we might say they go for the evidence that is useful to show a wanted effect. Tell a statistician the result you want, the effect or manipulation you want to achieve, and then he/she will find the statistics and the presentation of it that conveys the message convincingly.

So where are we given the math of Kolmogorov, Bayes, Laplace, Euler, etc? Is there something between Fuzzy and Bayesian that works? It is interesting that millions of people will be perfectly comfortable to compute the probability that a die rolls 3, or to draw a yellow ball from an urn with 4 yellow, 3 green, 2 red, and 1 blue ball. Adding a hint of complexity, compute the probability after evidence, appears to be an exponentially more complex task. The philosopher asks: How is it that our current math goes from simple, P(yellow)=4/10, everybody agrees, to relatively complex as soon as some simple evidence is in the game?

2. Thomas

>Probability experts? Of course, there are two possible outcomes. At least two!

Of course there are an infinite number corresponding to any possible different prior assumption.

>The next step will be to define the priors of the experts (assumptions), and combine the probabilities, to get to the probability of the conclusion to be true: P(P(3)=0.5 | uniform,bell) = ...?

You can do all of that within the same AgenaRisk model. For example, with the uniform priors and with the observation of seven 3s rolled the posterior distribution for P(3) had a mean of 0.618. The other probabilities for the other P(i)s were distributions with mean about 0.079. However, we can simply enter P(3)=0.5 as an observationin the model and run it again. This results in revised posteriors for the other P(i)s - they are heavily left-skewed distributions with mean of about 0.101 and variance 0.0069.

>Btw, is there a Bayesian expert telling P(3 | seven occurrences but no occurrence of any other number) < 1/6?

Absolutely. If the Bayesian has a very strong prior belief that P(3) is sufficiently low then of course even the observation of seven 3s only will still leave P(3)<1/6

Regarding the last question about the complexity of all of this. Yes, as soon as you attempt to model the real world rather than an idealised mathematical representation it get extremely tricky. Indeed without a tool like AgenaRisk the Bayesian calculations would not be feasible. Luckily AgenaRisk hides all of the complexity - you just have to put in your priors.

3. Seems your solution implies the dice is fair.
Why? There is no trace of any outcome (1,2,4,5 or 6) except 3's only. I.e. there is a good chance the dice is biased to 3.

Therefore first we have to evaluate a hypothesis H = "fair dice", i.e. considering event E7 (= 7 threes in a row), first we have to estimate P(H|E7).
We do know P(E7|H) = 3.57E-6. Quite low.

P(not H) and, respectively, P(E7|not H) - source for suggestions. Therefore "best" guess for "three next" is "likely probability is above 1/6"

4. Vladimir you said

> Seems your solution implies the dice is fair.

No. It depends on your prior assumption about the die. The solution gave three results for three very different priors. In the uniform prior case we conclude the die is very likely biased - with a 68% chance the next roll is a 3.

5. Thanks, Norman, your reply has been noticed.

You asked: "And what about the statistician?"

First, as you said, "hull hypothesis" H0 (the dice is fair) is rejected at the level of significance of 0.01.

Then "the statistician" would set a confidence level. Making slightly dangerous move we can set it 99% (just to follow you example). And probability "not three" shall be in the interval between 0 and 1-(0.01)^(1/7) = 48.2%, i.e. event "three next" has probability between 51.8% and 100% with 99% confidence.

If we reduce the confidence down to 95%, then probability "three next" would be between 65 and 100%.

Thus "the statistician" would give more precise answer.

It's not to undermine the Bayesian inference - seems it can do better in the example.

The real question coming with the ArenaRisk implementation as you said:

"The solution gave three results for three very different priors".

What i could see - the software didn't change "mathematician's prior" (P(k)=1/6) producing the same posterior (1/6). Generally speaking, Bayesian approach should converge to the same posteriors for any suggested priors. I understand that 7 rolls is too small to show this convergence. But not moving in any direction "mathematician's prior" is sort of scaring off.

cheers

You said (about the "mathematician's prior"):

>I understand that 7 rolls is too small to show this convergence. But not moving in any direction "mathematician's prior" is sort of scaring off.

The whole point of the example was to demonstrate that the mathematician's prior was not a sensible option because NO MATTER WHAT DATA is observed (even a trillion 3's in sequence) nothing can shift the fixed 1/6 probability of a 3.

7. Norman,

you said

>the mathematician's prior was not a sensible option because NO MATTER WHAT DATA is observed

I understand "the mathematician's" belief (hypothesis) based on prior INFORMATION available to him (denote this as H0|I) is " I've checked the dice - it's ABSOLUTELY fair". Then he is right (and AgenaRisk either).

I.e. now i understand better you example - you simply named persons depending on their prior information with the extremes:
- mathematician knows for sure it's fair dice. And he is right.
- the "probability expert" has no idea on the dice and blindly put priors as uniform. And he would be right.

Now let's talk about the probability expert's result:

You said:
>probability of rolling a 3 next time is about 0.618

I did the following:
Hypothesis H: "probability of 3 is p3"
1) put uniform continuous P(H) i.e. P(p3) = 1
2) E7 = event to get 7 threes in row
3) P(E7|H) = P(E7|p3) = PDF(E7|p3) = 8*p3^7 (8 is required to normalize the PDF)
4) I arrive to the following results:
- with 99% confidence "next 3" probability is in interval 56.2% and 100%
- with 95% confidence "next 3" probability is in interval 68.8% and 100%
- mean probability "next 3" is 88.889%

The estimations above are expectedly better compare to the orthodox statistics.

But what is worrying me - the difference with the value (mean probability "next 3" 0.618).