20 - How And Why To Learn Bayesian Inference

This article was originally published on Towards Data Science (opens new window).

For many data scientists, the topic of Bayesian Inference is as intimidating as it is intriguing. While some may be familiar with Thomas Bayes’ famous theorem or even have implemented a Naive Bayes classifier, the prevailing attitude that I have observed is that Bayesian techniques are too complex to code up for statisticians but a little bit too “statsy” for the engineers. The truth is, recent advancements in probabilistic programming have rendered these methods more accessible than ever before and in this article, I will explain why I decided to learn Bayesian Inference and outline some of the best resources I have come across in this endeavour.

# Why You Should “Go Bayesian”

While the use cases are often very different, Bayesian methods share some of the perceived drawbacks of Deep Learning in that they are highly technical (often requiring specialised programming languages), can take years to master, are computationally expensive and to the outside observer appear to work by dark magic. Why then would anybody bother learning these techniques when we can already train state of the art machine learning models with simple code like the following?

import amazing_ml_algo
model = amazing_ml_algo()
model.fit(X_train, y_train)

The primary reason is simply that Bayesian Inference is another tool in your toolbelt and can be very powerful in situations where traditional Machine Learning models are suboptimal such as:

When you only have a small amount of data
When your data is very noisy
When you need to quantify confidence
When you want to incorporate prior beliefs into your model
When model explainability is important

In my experience, these are the exact kinds of practical concerns data scientists face in their everyday work. Bayesian models may not win you Kaggle competitions (although there are many examples where they have) but they are fantastic for solving the real problems of real stakeholders in real businesses. As AutoML tools and off-the-shelf solutions are adopted more widely across the industry, I believe data scientists will increasingly find themselves focused on the kinds of messy and bespoke problems that Bayesian methods are well suited to.

Another personal motivation for me was that I wanted to be more deliberate with my modelling choices. Off-the-shelf machine learning packages now make it easy to mindlessly train models like XGBoost without ever considering the shape or generative process for the underlying data, but with Bayesian Inference defining this generative process is at the heart of modelling.

# Prerequisites

Pretty much all of the resources below were compiled assuming the reader has a good understanding of probability theory and basic statistics. You should be comfortable with concepts like random variables, probability density functions and conditional probability as well as some key distributions such as the Normal, Binomial and Poisson distributions. While possibly not essential, I would also strongly recommend reading up on the Law of Large Numbers, the Central Limit Theorem and Maximum Likelihood Estimation. If any of the terms I’ve used in this paragraph are unfamiliar, it is probably a good idea to start with basic probability and statistics courses before proceeding with my suggested Bayesian resources.

There are many options for probabilistic programming packages in both Python and R (such as PyMC, Stan, Edward, TensorFlow Probability etc.). Ultimately you need to choose the package and language that works best for you, but to get the most out of the resources below it will help to have some experience with Python. In particular, Probabilistic Programming and Bayesian Methods for Hackers is written in Python with the option of using PyMC or TensorFlow Probability, so for maximum compatibility I recommend using one of these probabilistic programming packages.

# Key Resources

Below I list the three resources I have found most useful for learning Bayesian inference. They all start from the basics and cover similar concepts, but each resource approaches the topic from a slightly different angle. At first glance it might seem like overkill to recommend three separate introductory texts but the more examples you can get exposure to the better. For me personally, understanding the theory behind Bayesian inference was only half the battle and I had to see many modelling examples before I was comfortable applying these techniques to my own data science projects.

My suggestion is to take it slowly and work through all three resources simultaneously. If this is your first time “going Bayesian” some of the concepts may be confusing, but seeing them explained in different ways should help you familiarise yourself with the key ideas.

# Bayesian Data Analysis — Andrew Gelman, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari & Donal B. Rubin (opens new window)

This book is pretty much the bible for Bayesian analysis. While its 670 page length and heavy focus on theory (think formulas and proofs) may put off less intrepid learners, this is the most comprehensive guide to the topic that I have found. If you have infinite time/desire, feel free to read the whole thing but for the rest of you I would focus on Chapters 1–5, 10 and 11.

# Bayesian Inference — Ville Hyvönen & Topias Tolonen (opens new window)

I don’t know anything about the authors of this guide and can’t even remember how I came across it but this is a fantastic resource and one that I have come back to time and time again. It covers many key topics including Bayes’ Theorem, Conjugate Priors, Monte Carlo Markov Chain and Hierarchical Models and explains each in simple terms using great examples. If you only read one introductory guide to Bayesian inference this year, make it this one.

# Probabilistic Programming and Bayesian Methods for Hackers — Cameron Davidson-Pilon (opens new window)

This resource is invaluable for bridging the gap between theory and practical application. It consists of a series of Jupyter Notebooks full of Bayesian modelling examples and will allow you to get your hands dirty with probabilistic programming. As mentioned previously, this resource is intended specifically for Python programmers, however non-Python users will still get a lot out of reading through the examples (even if they don’t execute any code).

# Bonus Resources

Here I have listed a few additional resources that I have come across in my Bayesian journey. In my opinion they aren’t as universally essential as the three texts listed above but they are all of a high quality and provide even more examples to help you get into the Bayesian way of thinking.

# TensorFlow Probability Tutorials (opens new window)

These are a series of TensorFlow Probability tutorials provided on the official website. If you have chosen to go with TFP as your probabilistic programming language then these will be particularly useful. There are some great examples of specific models including Probabilistic Regression, Structural Time Series, Probabilistic PCA and many more.

# Bayesian Methods for Machine Learning — Coursera (opens new window)

This is a fantastic course from Coursera that will probably appeal most to those with a maths/stats background. It covers some advanced topics such as Latent Dirichlet Allocation, Variational Autoencoders and Gaussian Processes. This course isn’t technically free but you should be able to access all of the lectures by auditing the course.

# Think Bayes — Allan B. Downey (opens new window)

I would put this book in the same category as Probabilistic Programming and Bayesian Methods for Hackers in that it takes a computational approach to Bayesian statistics. If you come from an engineering background this book may appeal to you more than some of the theory-focused resources I have listed above.

# Conclusion

Hopefully, the resources I have outlined in this article will be as useful for you as they have been for me. While Bayesian Inference may take some time to get your head around, my advice is to just try to absorb as many examples as you can and eventually it will click. Good luck and remember, if it was easy everyone would do it!