Virtual environments
Or how to install a conda package properly
We are going to need to install Theano on OSC. This means we need a new conda virtual environment! Let's see how we would do that. I would open a new terminal, because it's a bit slow:
conda create -n theano_env python=3.6 anaconda theano
Name (-n
): theano_env
Python version: 3.6
Package list: All packages from anaconda
, theano
This will also install up-to-date packages to link into the new environment, so it might be a bit slow. On your local machine, it might be faster, since you probably conda update anaconda
occasionally.
You'll next need your kernels available - you can either add nb_conda_kernels
to your base environment if you own it (not on OSC), or install them manually:
source activate theano_env
python -m ipykernel install --user --name theano_env
Note: In the latest Conda release, besides Python 3.7, you can now use
conda activate
instead.
While you are in the environment, go ahead and install pymc3: pip install pymc3
or conda install -c conda-forge pymc3
.
# Numpy keeps throwing a Future warning, but we will ignore it.
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)
import scipy.stats
import scipy.special
import numpy as np
import matplotlib.pyplot as plt
Example: Biased coin
This is a very popular example for understanding Bayesian statistics. Let's assume you have a coin and you think it may be biased. You job is to flip the coin and calculate the bias $p$, where $p$ is the chance of obtaining a head ($p=.5$ for an unbiased coin).
You flip the coin 14 times. It lands heads up 10 times.
Let's illustrate the difference by asking both the Frequentist and the Bayesian a question: Would you bet even money that the next two flips will both land heads up?
n = 14
heads = 10
a = 1
b = 1
p_freq = heads / n
prob_2_heads = p_freq ** 2
print(f"p={p_freq:.3}, and the chance of two heads is {prob_2_heads:.1%}")
Bayesian statistics: Analytical
For us, $p$ is no longer a value, but a distribution. What is the probability of $p$, given data $x$? This is the Bayes formula:
$$ P(p|x) = \frac{P(x|p) P(p)}{P(x)} \tag{1} $$We can compute $P(x|p)$ using a binomial distribution:
$$ P(x|p) \propto p^{10} \left( 1-p \right)^4 $$Bayesian's need to select a prior distribution $P(p)$. We'll pick the Beta distribution:
$$ P(p) \propto p^{a-1} \left( 1-p \right)^{b-1} $$This gives us a combined distribution of:
$$ P(p|x) \propto p^{10+a-1} \left( 1-p \right)^{4+b-1} $$Now, we need to normalize (we've been dropping constants everywhere intentionally). This is again a Beta distribution, so we can just write it that way:
$$ P(p|x) = \mathrm{Beta}(p; a+10, b+4) $$x = np.linspace(0, 1, 200)
orig = scipy.stats.beta.pdf(x, a, b)
new = scipy.stats.beta.pdf(x, a + heads, b + n - heads)
plt.plot(x, orig, label="Prior")
plt.plot(x, new, label="Posterior")
plt.legend()
plt.show()
Working through some math, we could find the probability of two heads in a row:
$$ P(HH|x) = \frac{B(10+a+2, 4+b)}{B(10+a, 4+b)} $$(Here $B$ is the beta function, not beta distribution).
(scipy.special.beta(10 + 1 + 2, 4 + 1) / scipy.special.beta(10 + 1, 4 + 1))
So the Bayesian would vote against two heads in a row.
We were lucky here, due to our careful choice of problem. We had a nice posterior and could do everything analytically. What if we didn't such a perfect setup?
Markov-Chain Monte Carlo
See also https://twiecki.github.io/blog/2015/11/10/mcmc-sampling/
Let's look again at the Bayes formula for a general parameter $\theta$:
$$ P(\theta|x) = \frac{P(x|\theta) P(\theta)}{P(x)} \tag{1} $$- $P(\theta)$: Prior
- $P(x|\theta)$: Likelihood
- $P(x)$: Evidence
We need to compute:
$$ P(x) = \int _\Theta P(x,\theta) d\theta $$For non-simple models, this is not closed form. So how about using MC and sampling the posterior? Now you need to solve and invert the Bayes formula! We didn't fix anything.
How about we go crazy and decide to build a markov chain that has an equilibrium state that matches our unknown posterior distribution? While this may seem yet one step harder still, it turns out that we can do it rather easily!
Markov Chains
Let's change gears just for a second, and talk about Markov chains. A Markov chain is a system where the next state of the system depends only on the current state of the system, not on any prior states. Chutes and Ladders, the children's board game, is the canonical example: you can set the game up, then the next possible state is a collection of probabilities based on dice rolls and the current marker locations, but it does not have any "memory" as to how you got to the current state.
For us, it means that we start at some point, then try to "jump" to a next point based on a probability. If we like the new point better, we keep it. This would be a simple search, and would "stop" at the peak -- but we want a distribution, not a value -- so we make an acceptance probability: $p_\mathrm{accept} = p_\mathrm{proposal} / p_\mathrm{current}$.
n = 100
heads = 61
a = 10
b = 10
niter = 2_000
x = np.linspace(0, 1, 200)
orig = scipy.stats.beta.pdf(x, a, b)
new = scipy.stats.beta.pdf(x, a + heads, b + n - heads)
plt.plot(x, orig, label="Prior")
plt.plot(x, new, label="Posterior")
plt.legend()
plt.show()
import pymc3 as pm
with pm.Model() as coin_context:
p = pm.Beta("p", alpha=2, beta=2)
y = pm.Binomial("y", n=n, p=p, observed=heads)
trace = pm.sample(niter)
plt.plot(trace["p"])
plt.show()
p = trace.get_values("p", burn=niter // 2, combine=True, chains=[0, 2])
plt.plot(p)
plt.show()
Gaussian example
Taken verbatim from https://twiecki.github.io/blog/2015/11/10/mcmc-sampling/. It uses a Gaussian instead of the coin flip. We are estimating the mean from 20 samples drawn from a data sample centered around 0.
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import norm
sns.set_style("white")
sns.set_context("talk")
np.random.seed(123)
data = np.random.randn(20)
ax = plt.subplot()
sns.distplot(data, kde=False, ax=ax)
_ = ax.set(title="Histogram of observed data", xlabel="x", ylabel="# observations")
def calc_posterior_analytical(data, x, mu_0, sigma_0):
sigma = 1.0
n = len(data)
mu_post = (mu_0 / sigma_0 ** 2 + data.sum() / sigma ** 2) / (
1.0 / sigma_0 ** 2 + n / sigma ** 2
)
sigma_post = (1.0 / sigma_0 ** 2 + n / sigma ** 2) ** -1
return norm(mu_post, np.sqrt(sigma_post)).pdf(x)
ax = plt.subplot()
x = np.linspace(-1, 1, 500)
posterior_analytical = calc_posterior_analytical(data, x, 0.0, 1.0)
ax.plot(x, posterior_analytical)
ax.set(xlabel="mu", ylabel="belief", title="Analytical posterior")
sns.despine()
def sampler(
data,
samples=4,
mu_init=0.5,
proposal_width=0.5,
plot=False,
mu_prior_mu=0,
mu_prior_sd=1.0,
):
mu_current = mu_init
posterior = [mu_current]
for i in range(samples):
# suggest new position
mu_proposal = norm(mu_current, proposal_width).rvs()
# Compute likelihood by multiplying probabilities of each data point
likelihood_current = norm(mu_current, 1).pdf(data).prod()
likelihood_proposal = norm(mu_proposal, 1).pdf(data).prod()
# Compute prior probability of current and proposed mu
prior_current = norm(mu_prior_mu, mu_prior_sd).pdf(mu_current)
prior_proposal = norm(mu_prior_mu, mu_prior_sd).pdf(mu_proposal)
p_current = likelihood_current * prior_current
p_proposal = likelihood_proposal * prior_proposal
# Accept proposal?
p_accept = p_proposal / p_current
# Usually would include prior probability, which we neglect here for simplicity
accept = np.random.rand() < p_accept
if plot:
plot_proposal(
mu_current,
mu_proposal,
mu_prior_mu,
mu_prior_sd,
data,
accept,
posterior,
i,
)
if accept:
# Update position
mu_current = mu_proposal
posterior.append(mu_current)
return posterior
# Function to display
def plot_proposal(
mu_current, mu_proposal, mu_prior_mu, mu_prior_sd, data, accepted, trace, i
):
from copy import copy
trace = copy(trace)
fig, (ax1, ax2, ax3, ax4) = plt.subplots(ncols=4, figsize=(16, 4))
fig.suptitle("Iteration %i" % (i + 1))
x = np.linspace(-3, 3, 5000)
color = "g" if accepted else "r"
# Plot prior
prior_current = norm(mu_prior_mu, mu_prior_sd).pdf(mu_current)
prior_proposal = norm(mu_prior_mu, mu_prior_sd).pdf(mu_proposal)
prior = norm(mu_prior_mu, mu_prior_sd).pdf(x)
ax1.plot(x, prior)
ax1.plot([mu_current] * 2, [0, prior_current], marker="o", color="b")
ax1.plot([mu_proposal] * 2, [0, prior_proposal], marker="o", color=color)
ax1.annotate(
"",
xy=(mu_proposal, 0.2),
xytext=(mu_current, 0.2),
arrowprops=dict(arrowstyle="->", lw=2.0),
)
ax1.set(
ylabel="Probability Density",
title="current: prior(mu=%.2f) = %.2f\nproposal: prior(mu=%.2f) = %.2f"
% (mu_current, prior_current, mu_proposal, prior_proposal),
)
# Likelihood
likelihood_current = norm(mu_current, 1).pdf(data).prod()
likelihood_proposal = norm(mu_proposal, 1).pdf(data).prod()
y = norm(loc=mu_proposal, scale=1).pdf(x)
sns.distplot(data, kde=False, norm_hist=True, ax=ax2)
ax2.plot(x, y, color=color)
ax2.axvline(mu_current, color="b", linestyle="--", label="mu_current")
ax2.axvline(mu_proposal, color=color, linestyle="--", label="mu_proposal")
# ax2.title('Proposal {}'.format('accepted' if accepted else 'rejected'))
ax2.annotate(
"",
xy=(mu_proposal, 0.2),
xytext=(mu_current, 0.2),
arrowprops=dict(arrowstyle="->", lw=2.0),
)
ax2.set(
title="likelihood(mu=%.2f) = %.2f\nlikelihood(mu=%.2f) = %.2f"
% (
mu_current,
1e14 * likelihood_current,
mu_proposal,
1e14 * likelihood_proposal,
)
)
# Posterior
posterior_analytical = calc_posterior_analytical(data, x, mu_prior_mu, mu_prior_sd)
ax3.plot(x, posterior_analytical)
posterior_current = calc_posterior_analytical(
data, mu_current, mu_prior_mu, mu_prior_sd
)
posterior_proposal = calc_posterior_analytical(
data, mu_proposal, mu_prior_mu, mu_prior_sd
)
ax3.plot([mu_current] * 2, [0, posterior_current], marker="o", color="b")
ax3.plot([mu_proposal] * 2, [0, posterior_proposal], marker="o", color=color)
ax3.annotate(
"",
xy=(mu_proposal, 0.2),
xytext=(mu_current, 0.2),
arrowprops=dict(arrowstyle="->", lw=2.0),
)
# x3.set(title=r'prior x likelihood $\propto$ posterior')
ax3.set(
title="posterior(mu=%.2f) = %.5f\nposterior(mu=%.2f) = %.5f"
% (mu_current, posterior_current, mu_proposal, posterior_proposal)
)
if accepted:
trace.append(mu_proposal)
else:
trace.append(mu_current)
ax4.plot(trace)
ax4.set(xlabel="iteration", ylabel="mu", title="trace")
plt.tight_layout()
# plt.legend()
np.random.seed(123)
sampler(data, samples=8, mu_init=-1.0, plot=True)
posterior = sampler(data, samples=15000, mu_init=1.0)
fig, ax = plt.subplots()
ax.plot(posterior)
_ = ax.set(xlabel="sample", ylabel="mu")
ax = plt.subplot()
sns.distplot(posterior[500:], ax=ax, label="estimated posterior")
x = np.linspace(-0.5, 0.5, 500)
post = calc_posterior_analytical(data, x, 0, 1)
ax.plot(x, post, "g", label="analytic posterior")
_ = ax.set(xlabel="mu", ylabel="belief")
ax.legend()