Bayes’ Theorem and the Law of Total Probability

Bayes’ theorem is immensely important, especially for avoiding the classical drug test faux pas that, according to Harvard, six out of eight medical students and 19 out of 25 attending physicians are guilty of.

I will not explain the intuition as tens, if not hundreds, of articles have already done so quite well. What I find other sources lack, however, is a simple encyclopedic reference that is less introductory and less verbose; a tool for practitioners like myself that need a quick brush up. Thus, the goal of this post is to tersely spell out Bayes’ theorem so both you and I have a handy reference.


Let’s consider two events, A and B, which are subsets of an outcome space we’ll denote by Ω. Bayes’ theorem simply states that the joint probability is equal to the product of the likelihood and the prior

bayes_2

where the intersection simply means “and” as far as logic is concerned. Flipping the role of A and B, we also have that

bayes_3

which— by symmetry of “and” (i.e. the intersecting relation)—means that the product of the prior and likelihood is equal to the product of the posterior and normalizing constant (which is also the joint probability):

bayes_4

All of the above are variants of Bayes’ theorem. Typically, the equality is presented in some form involving a quotient, but we avoid this in case the denominator is zero. The Law of Total Probability simply states that any unconditional probability can be deconstructed into a weighted sum of conditional probabilities

law_of_total_1

where the scripted A is any set abiding by the following two conditions: The union must encompass the outcome space Ω

law_of_total_2.png

and any two subsets must be pairwise disjoint

law_of_total_3

In other words, scripted A (the set we sum over) is any partition of the outcome space. We can put these together to get the following variant of Bayes’ theorem

bayes_7

where the first equality follows by an invocation of Bayes’ theorem and the second follows by re-applying Bayes’ theorem (numerator) and the Law of Total Probability (denominator). In words,

bayes_6

Notice that if the probability of B is zero, then the posterior doesn’t really make sense and neither does the quotient on the right side of the equation (both are ill-defined).

Conclusion. If solving for the probability of A conditional on B isn’t a flick-o-da-wrist, but deriving probability of B given anything is pretty straightforward, then this equation is the magic bullet. I’ll leave you with some pleasant verbosity on Bayes’ theorem, courtesy of the New York Times, The Mathematics of Changing Your Mind.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s