Edit or run this notebook
3.0 ms

Package initialization

12.3 μs
1.1 ms

EllipticalSliceSampling.jl: MCMC with Gaussian priors

David Widmann (@devmotion )

Uppsala University, Sweden

JuliaCon, July 2021

70.2 ms

Elliptical slice sampling

Murray, I., Adams, R. & MacKay, D.. (2010). Elliptical slice sampling. Proceedings of Machine Learning Research, 9:541-548.

Markov chain Monte Carlo (MCMC) method for models with Gaussian prior

3.0 ms

Example: Point process

  • Gaussian prior

    pX(x):=N(x;3.5,1)

  • Likelihood

    L(x):=pY|X(y|x):=Poisson(y;log(1+exp(x)))

  • Elliptical slice sampling approximates

    pX|Y(x|y)pY|X(y|x)pX(x)=L(x)pY|X(y|x)

5.9 ms

y: 5

69.9 μs
66.0 ms
243 ms

Motivation

Assume a Gaussian prior N(0,Σ) with zero mean.

  • Metropolis-Hastings method with proposal distribution

    Pε(x)=N(x1ε2,ϵ2Σ),

    where x is the current state and ε[0,1] is a step-size parameter, requires tuning of the step-size ε for efficient mixing

  • Idea: search over the step-size and consider (half-)ellipse of possible proposals for varying ε

25.2 μs
92.9 ms
60.0 μs
146 μs

EllipticalSliceSampling.jl

Julia implementation of elliptical slice sampling.

Features:

  • Supports arbitrary Gaussian priors, also with non-zero mean

  • Based on the AbstractMCMC.jl interface

  • Uses ArrayInterface.jl to reduce allocations if samples are mutable

31.6 μs

AbstractMCMC.jl

AbstractMCMC.jl defines an interface for MCMC algorithms.

If you want to implement an algorithm, you have to implement

AbstractMCMC.step(rng, model, sampler[, state; kwargs...])

that defines the sampling step of the algorithm (in the initial step no state is provided).

Then the default definitions provide you with

  • progress bars,

  • support for user-provided callbacks,

  • support for thinning and discarding initial samples,

  • support for sampling with a custom stopping criterion,

  • support for sampling multiple chains, serially or in parallel with multiple threads or multiple processes,

  • an iterator and a transducer for sampling Markov chains.

56.0 μs

See you at JuliaCon!

1.6 μs

Additional material

5.6 μs

Example: Gaussian likelihood

In this case the posterior is analytically tractable.

Here we choose

pX(x):=N(x;[3.51.5],[0.5001.5])

and

L(x):=pY|X([0,2.5]T|x):=N([02.5];x,[0.75000.5]).

We obtain

pX|Y(x)=N(x;[2.12.25],[0.3000.375])

11.0 μs
gaussian_prior
DiagNormal(
dim: 2
μ: [3.5, 1.5]
Σ: [0.5 0.0; 0.0 1.5]
)
2.2 μs
gaussian_loglikelihood (generic function with 1 method)
19.2 μs
gaussian_samples
17.8 ms

The following plot shows the prior distribution (right), the likelihood (left), and the analytical posterior and the samples obtained with elliptical slice sampling (center).

8.2 μs
151 ms

Example: Gaussian likelihood with Turing

We can also formulate the analytically tractable example with Turing and use elliptical slice sampling for Bayesian inference.

10.3 μs
gaussian_model (generic function with 1 method)
57.3 μs
gaussian_samples_turing
iterationchainx[1]x[2]lp
Int64Int64Float64Float64Float64
1
1
1
1.12649
3.19509
-2.67659
2
2
1
0.783803
2.07695
-1.936
3
3
1
1.54247
2.66388
-2.96046
4
4
1
1.14741
2.4682
-2.22618
5
5
1
2.81702
3.53366
-7.7063
6
6
1
2.20533
1.94238
-4.90073
7
7
1
1.70427
2.302
-3.32302
8
8
1
1.81732
2.4507
-3.55167
9
9
1
2.38714
2.63821
-5.16553
10
10
1
2.32798
2.76788
-5.03222
more
3.1 s

Again, the following plot shows the prior distribution (right), the likelihood (left), and the analytical posterior and the samples obtained with elliptical slice sampling (center).

8.0 μs
68.4 ms

Example: Gibbs sampling with Turing

It is also possible to use elliptical slice sampling within a Gibbs sampler. For instance, here we consider a model with prior distributions

pσ2(v):=InverseGamma(v;2,3),pM|σ2(m|v):=N(m;0,v),

and likelihood function

L(m,v):=pY|M,σ2([1.5,2]T|m,v):=N(1.5;m,v)N(2;m,v).

10.2 μs
turing_model (generic function with 1 method)
72.0 μs
turing_samples
iterationchainσ²mlp
Int64Int64Float64Float64Float64
1
1
1
3.26046
2.10712
-7.53727
2
2
1
2.26858
1.17912
-6.04583
3
3
1
1.53555
0.139986
-6.17846
4
4
1
1.937
1.9955
-6.17477
5
5
1
1.90338
0.768406
-5.72623
6
6
1
1.05555
1.90944
-5.45533
7
7
1
2.88109
-0.229259
-7.75315
8
8
1
1.14605
0.0912135
-6.24981
9
9
1
0.63478
1.27422
-4.97445
10
10
1
7.00457
1.0926
-9.90325
more
3.7 s

For illustration purposes we chose a model where the posterior is analytically tractable. The following plots visualize the samples obtained with the Gibbs sampler (gray) and their mean (blue). The mean of the posterior distribution is shown in yellow.

9.8 μs
193 ms

Example: Gaussian process regression

In this example, we consider a Gaussian process regression model, similar to the PyMC3 documentation.

We use a squared exponential kernel with length scale 0.1. First, we generate noisy data with the AbstractGPs.jl interface.

13.0 μs
36.9 ms
gp
1.7 μs
x
20.7 μs
gp_x
AbstractGPs.FiniteGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Int64}}}, Vector{Float64}, LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}}}(
f: GP{AbstractGPs.ZeroMean{Float64}, TransformedKernel{SqExponentialKernel{Euclidean}, ScaleTransform{Int64}}}(AbstractGPs.ZeroMean{Float64}(), Squared Exponential Kernel (metric = Euclidean(0.0))
	- Scale Transform (s = 10))
x: [0.05664544616214151, 0.06609803322813423, 0.1096774963723961, 0.11258244478647295, 0.12078054506961555, 0.17165450700728413, 0.17957407667101322, 0.2422083248151139, 0.2502869659391691, 0.32847482856998655  …  0.5961723206552116, 0.6699313951612162, 0.6791898821149229, 0.7252498905219804, 0.8151038332483567, 0.8197779704008801, 0.8440074795625907, 0.9233537554310114, 0.9236760031564317, 0.9991722180201814]
Σy: [0.1 0.0 … 0.0 0.0; 0.0 0.1 … 0.0 0.0; … ; 0.0 0.0 … 0.1 0.0; 0.0 0.0 … 0.0 0.1]
)
58.0 ns
y
123 μs

The following plot shows the data and the analytically tractable posterior distribution (mean ± one standard deviation).

7.4 μs
30.1 ms

We perform elliptical slice sampling of the original data (without noise).

10.8 μs
gp_regression_samples
127 ms

We plot the mean of the posterior distributions based on the samples from elliptical slice sampling.

13.4 μs
11.2 s

Example: Gaussian process classification

In this example, we consider a Gaussian process classification model, similar to the PyMC3 documentation.

Again, we use a squared exponential kernel with length scale 0.1 and the same noisy data as above. However, this time we assign a value of 0 (or false) to all negative values, and a value of 1 (or true) to all non-negative values.

13.6 μs
z
10.1 μs
28.0 ms

We perform elliptical slice sampling to infer the posterior distribution of the original, non-noisy values of the Gaussian process model.

7.4 μs
gp_classification_samples
35.0 ms

We plot the mean of the posterior distributions of the Gaussian process based on the samples from elliptical slice sampling.

6.9 μs
3.7 s
Loading...
ii
Loading...