Bayes’ Theorem and the Law of Total Probability

Bayes’ theorem is immensely important, especially for avoiding the classical drug test faux pas that, according to Harvard, six out of eight medical students and 19 out of 25 attending physicians are guilty of.

I will not explain the intuition as tens, if not hundreds, of articles have already done so quite well. What I find other sources lack, however, is a simple encyclopedic reference that is less introductory and less verbose; a tool for practitioners like myself that need a quick brush up. Thus, the goal of this post is to tersely spell out Bayes’ theorem so both you and I have a handy reference.

Let’s consider two events, A and B, which are subsets of an outcome space we’ll denote by Ω. Bayes’ theorem simply states that the joint probability is equal to the product of the likelihood and the prior


where the intersection simply means “and” as far as logic is concerned. Flipping the role of A and B, we also have that


which— by symmetry of “and” (i.e. the intersecting relation)—means that the product of the prior and likelihood is equal to the product of the posterior and normalizing constant (which is also the joint probability):


All of the above are variants of Bayes’ theorem. Typically, the equality is presented in some form involving a quotient, but we avoid this in case the denominator is zero. The Law of Total Probability simply states that any unconditional probability can be deconstructed into a weighted sum of conditional probabilities


where the scripted A is any set abiding by the following two conditions: The union must encompass the outcome space Ω


and any two subsets must be pairwise disjoint


In other words, scripted A (the set we sum over) is any partition of the outcome space. We can put these together to get the following variant of Bayes’ theorem


where the first equality follows by an invocation of Bayes’ theorem and the second follows by re-applying Bayes’ theorem (numerator) and the Law of Total Probability (denominator). In words,


Notice that if the probability of B is zero, then the posterior doesn’t really make sense and neither does the quotient on the right side of the equation (both are ill-defined).

Conclusion. If solving for the probability of A conditional on B isn’t a flick-o-da-wrist, but deriving probability of B given anything is pretty straightforward, then this equation is the magic bullet. I’ll leave you with some pleasant verbosity on Bayes’ theorem, courtesy of the New York Times, The Mathematics of Changing Your Mind.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s