Saturday, September 8, 2012

Bayes’ Theorem, A Quick Introduction

We all know that the probability of a hypothesis being true often changes in light of the evidence. Wouldn’t it be cool if math could help us show how it works? Fortunately, math is cool enough to help out here thanks to something called Bayes’ theorem. In this article I’ll introduce Bayes’ theorem and the insights it gives about how evidence works. In my next blog entry I’ll show how Bayes’ theorem can be applied in the service of theism.

One Form of Bayes’ Theorem

Bayes’ theorem is often used to mathematically show the probability of some hypothesis changes in light of new evidence. Bayes’ theorem is named after Reverend Thomas Bayes, an ordained Christian minister and mathematician, who presented the theorem in 1764 in his Essay towards solving a problem in the doctrine of chances. Before showing what the theorem is, I’ll recap some basic probability symbolism.

Pr(A) = The probability of A being true; e.g. Pr(A) = 0.5 means “The probability of A being true is 50%.”
Pr(A|B) = The probability of A being true given that B is true. For example:
Pr(I am wet|It is raining) = 0.8
This means “The probability that I am wet given that it is raining is 80%.”
Pr(¬A) = The probability of A being being false (¬A is read as “not-A”); e.g. Pr(¬A) = 0.5 means “The probability of A being false is 50%.”
Pr(B ∪ C) = The probability that B or C (or both) are true.
Pr(B ∩ C) = The probability that B and C are both true.
Pr(A|B ∩ C) = The probability of A given that both B and C are true.


Some alternate forms:

One VersionAlternate Forms
Pr(A) P(A)
Pr(¬A)  Pr(~A), Pr(−A), Pr(AC)
Pr(B ∪ C) Pr(A ∨ B)
Pr(B ∩ C) Pr(B ∧ C), Pr(B&C)
Pr(A|B)Pr(A/B)


The alternate forms can be combined, e.g. an alternate form of Pr(H|E) is P(H/E).

Bayes’ theorem comes in a number of varieties, but here’s one of the simpler ones where H is the hypothesis and E is the evidence:

Pr(H|E) = 
Pr(H) × Pr(E|H)
Pr(E)


In the situation where hypothesis H explains evidence E, Pr(E|H) basically becomes a measure of the hypothesis’s explanatory power. Pr(H|E) is called the posterior probability of H. Pr(H) is the prior probability of H, and Pr(E) is the prior probability of the evidence (very roughly, a measure of how surprising it is that we’d find the evidence). Prior probabilities are probabilities relative to background knowledge, e.g. Pr(E) is the likelihood that we’d find evidence E relative to our background knowledge. Background knowledge is actually used throughout Bayes’ theorem however, so we could view the theorem this way where B is our background knowledge:

Pr(H|E&B) = 
Pr(H|B) × Pr(E|H&B)
Pr(E|B)


To simplify it though I’ll leave the background knowledge in Bayes’ theorem implicit.

An Example

Here’s an example of Bayes’ theorem in action. Suppose we have a lottery and the odds are 1 in 5,461,512 that the following lottery numbers are chosen:
(4) (19) (26) (42) (51)
Let H be the hypothesis that the above lottery numbers were chosen. Let E be a newspaper called The Likely Truth reporting those numbers. The Likely Truth reports the lottery numbers with 99% accuracy (though it never fails to report some series of five lottery numbers of the sort that the lottery can result in, accurate or not), thereby making, making Pr(E|H) = 0.99. The odds that any particular series of five lottery numbers will be reported is likewise 1 in 5,461,512, making Pr(E) = 1 in 5,461,512. With that, we have the following probabilities:

Pr(H) = 
1
5,461,512

 ≈ 0.0000002
Pr(E) = 
1
5,461,512

 ≈ 0.0000002
Pr(E|H) = 0.99 

Plugging them into this version of Bayes’ theorem:

Pr(H|E) = 
Pr(H) × Pr(E|H)
Pr(E)

Gives us this:

Pr(H|E) = 
1
5,461,512
 × 0.99
1
5,461,512
 = 0.99

What’s interesting about this is that even though the prior probability of the hypothesis (≈0.0000002) is much lower than the probability that the newspaper made a mistake (0.01), the newspaper’s report still makes it rational to believe that the lottery numbers are probably accurate.

Another Form of Bayes’ Theorem

Keeping in mind I’ll leave the background knowledge in Bayes’ theorem implicit, another form of is Bayes’ theorem is this:

Pr(H|E) = 
Pr(H) × Pr(E|H)
Pr(H) × Pr(E|H) + Pr(~H) × Pr(E|~H)

One insight the above formula gives us is that ceteris paribus the more unlikely it is we’d find the evidence if the hypothesis were false (i.e. a lower Pr(E|~H)), the stronger the evidence becomes for the hypothesis. Another insight is that ceteris paribus the more likely it is we’d find the evidence if the hypothesis were true (i.e. a higher Pr(E|H)), the stronger the evidence is for the hypothesis.

In my next blog entry I’ll show how Bayes’ theorem can be used for theism.