xxxxxxxxxx

3.0 ms

Package initialization

xxxxxxxxxx

12.3 μs

xxxxxxxxxx
 
begin
    using AbstractGPs
    using AbstractGPsMakie
    using AlgebraOfGraphics
    using CairoMakie
    using Distances
    using Distributions
    using EllipticalSliceSampling
    using LogExpFunctions
    using Luxor
    using PlutoUI
    import Turing
​
    using LinearAlgebra
    using Random
​
    # plotting settings
    CairoMakie.activate!(; type="svg")
    set_theme!(; AlgebraOfGraphics.aog_theme()..., resolution=(800, 400))
end

1.1 ms

EllipticalSliceSampling.jl: MCMC with Gaussian priors

David Widmann (@devmotion )

Uppsala University, Sweden

JuliaCon, July 2021

xxxxxxxxxx

70.2 ms

Elliptical slice sampling

Murray, I., Adams, R. & MacKay, D.. (2010). Elliptical slice sampling. Proceedings of Machine Learning Research, 9:541-548.

Markov chain Monte Carlo (MCMC) method for models with Gaussian prior

xxxxxxxxxx

3.0 ms

Example: Point process

Gaussian prior

$p_{X} (x) := N (x; - 3.5, 1)$
Likelihood

$L (x) := p_{Y | X} (y | x) := Poisson (y; \log (1 + \exp (x)))$
Elliptical slice sampling approximates

$p_{X | Y} (x | y) \propto p_{Y | X} (y | x) p_{X} (x) = L (x) p_{Y | X} (y | x)$

xxxxxxxxxx

5.9 ms

$y$ : 5

xxxxxxxxxx

69.9 μs

xxxxxxxxxx

66.0 ms

xxxxxxxxxx
 
begin
    # Gaussian prior
    poisson_prior = Normal(-3.5, 1)
​
    # Log-likelihood
    poisson_loglikelihood = let y = poisson_y
        x -> logpdf(Poisson(log1pexp(x)), y)
    end
​
    # Elliptical slice sampling
    poisson_samples = let
        Random.seed!(100)
        sample(
            ESSModel(poisson_prior, poisson_loglikelihood),
            ESS(),
            10_000;
            progress=false,
            thinning=10,
            discard_initial=100,
        )
    end
end;

243 ms

Motivation

Assume a Gaussian prior $N (0, Σ)$ with zero mean.

Metropolis-Hastings method with proposal distribution

$P_{ε} (x) = N (x \sqrt{1 - ε^{2}}, ϵ^{2} Σ),$

where $x$ is the current state and $ε \in [0, 1]$ is a step-size parameter, requires tuning of the step-size $ε$ for efficient mixing
Idea: search over the step-size and consider (half-)ellipse of possible proposals for varying $ε$

xxxxxxxxxx

25.2 μs

xxxxxxxxxx

92.9 ms

xxxxxxxxxx

60.0 μs

xxxxxxxxxx

146 μs

EllipticalSliceSampling.jl

Julia implementation of elliptical slice sampling.

Features:

Supports arbitrary Gaussian priors, also with non-zero mean
Based on the AbstractMCMC.jl interface
Uses ArrayInterface.jl to reduce allocations if samples are mutable

xxxxxxxxxx

31.6 μs

AbstractMCMC.jl

AbstractMCMC.jl defines an interface for MCMC algorithms.

If you want to implement an algorithm, you have to implement

AbstractMCMC.step(rng, model, sampler[, state; kwargs...])

that defines the sampling step of the algorithm (in the initial step no state is provided).

Then the default definitions provide you with

progress bars,
support for user-provided callbacks,
support for thinning and discarding initial samples,
support for sampling with a custom stopping criterion,
support for sampling multiple chains, serially or in parallel with multiple threads or multiple processes,
an iterator and a transducer for sampling Markov chains.

xxxxxxxxxx

56.0 μs

See you at JuliaCon!

xxxxxxxxxx

1.6 μs

Additional material

xxxxxxxxxx

5.6 μs

Example: Gaussian likelihood

In this case the posterior is analytically tractable.

Here we choose

$p_{X} (x) := N (x; [\begin{matrix} 3.5 \\ 1.5 \end{matrix}], [\begin{matrix} 0.5 & 0 \\ 0 & 1.5 \end{matrix}])$

and

$L (x) := p_{Y | X} ([0, 2.5]^{T} | x) := N ([\begin{matrix} 0 \\ 2.5 \end{matrix}]; x, [\begin{matrix} 0.75 & 0 \\ 0 & 0.5 \end{matrix}]) .$

We obtain

$p_{X | Y} (x) = N (x; [\begin{matrix} 2.1 \\ 2.25 \end{matrix}], [\begin{matrix} 0.3 & 0 \\ 0 & 0.375 \end{matrix}])$

xxxxxxxxxx

11.0 μs

gaussian_prior

DiagNormal(
dim: 2
μ: [3.5, 1.5]
Σ: [0.5 0.0; 0.0 1.5]
)

xxxxxxxxxx
 
gaussian_prior = MvNormal([3.5, 1.5], Diagonal([0.5, 1.5]))

2.2 μs

gaussian_loglikelihood (generic function with 1 method)

xxxxxxxxxx
 
gaussian_loglikelihood(x) = logpdf(MvNormal(x, Diagonal([0.75, 0.5])), [0, 2.5])

19.2 μs

gaussian_samples

Vector{Float64}

Float64

1.12649

3.19509

Float64

0.783803

2.07695

Float64

1.54247

2.66388

Float64

1.14741

2.4682

Float64

2.81702

3.53366

Float64

2.20533

1.94238

Float64

1.70427

2.302

Float64

1.81732

2.4507

Float64

2.38714

2.63821

Float64

2.32798

2.76788

Float64

1.74184

1.28761

Float64

1.85611

2.76432

Float64

2.0027

2.18858

Float64

1.27965

3.18685

Float64

1.58742

2.86586

Float64

1.24819

2.16718

Float64

1.29675

2.76114

Float64

1.73967

2.45575

Float64

2.00741

2.72811

Float64

1.74774

2.76514

991

Float64

1.99574

1.32009

992

Float64

3.36052

2.23557

993

Float64

2.1571

3.55366

994

Float64

2.22033

2.65835

995

Float64

2.90649

1.31465

996

Float64

2.92001

2.29013

997

Float64

2.60893

2.77388

998

Float64

1.99568

1.62563

999

Float64

0.964462

2.54178

1000

Float64

2.37922

2.03176

xxxxxxxxxx
 
 gaussian_samples = let
    Random.seed!(100)
    sample(
        ESSModel(gaussian_prior, gaussian_loglikelihood),
        ESS(),
        1_000;
        progress=false,
        thinning=10,
        discard_initial=100,
    )
end

17.8 ms

The following plot shows the prior distribution (right), the likelihood (left), and the analytical posterior and the samples obtained with elliptical slice sampling (center).

xxxxxxxxxx

8.2 μs

xxxxxxxxxx

151 ms

Example: Gaussian likelihood with Turing

We can also formulate the analytically tractable example with Turing and use elliptical slice sampling for Bayesian inference.

xxxxxxxxxx

10.3 μs

gaussian_model (generic function with 1 method)

xxxxxxxxxx
 
Turing.@model function gaussian_model()
    # prior
    x ~ MvNormal([3.5, 1.5], Diagonal([0.5, 1.5]))
​
    # observations
    [0, 2.5] ~ MvNormal(x, Diagonal([0.75, 0.5]))
end

57.3 μs

gaussian_samples_turing

	iteration	chain	x[1]	x[2]	lp
	Int64	Int64	Float64	Float64	Float64
1	1	1	1.12649	3.19509	-2.67659
2	2	1	0.783803	2.07695	-1.936
3	3	1	1.54247	2.66388	-2.96046
4	4	1	1.14741	2.4682	-2.22618
5	5	1	2.81702	3.53366	-7.7063
6	6	1	2.20533	1.94238	-4.90073
7	7	1	1.70427	2.302	-3.32302
8	8	1	1.81732	2.4507	-3.55167
9	9	1	2.38714	2.63821	-5.16553
10	10	1	2.32798	2.76788	-5.03222
more

xxxxxxxxxx

3.1 s

Again, the following plot shows the prior distribution (right), the likelihood (left), and the analytical posterior and the samples obtained with elliptical slice sampling (center).

xxxxxxxxxx

8.0 μs

xxxxxxxxxx

68.4 ms

Example: Gibbs sampling with Turing

It is also possible to use elliptical slice sampling within a Gibbs sampler. For instance, here we consider a model with prior distributions

$\begin{aligned} p_{σ^{2}} (v) & := InverseGamma (v; 2, 3), \\ p_{M | σ^{2}} (m | v) & := N (m; 0, v), \end{aligned}$

and likelihood function

$L (m, v) := p_{Y | M, σ^{2}} ([1.5, 2]^{T} | m, v) := N (1.5; m, v) N (2; m, v) .$

xxxxxxxxxx

10.2 μs

turing_model (generic function with 1 method)

xxxxxxxxxx
 
Turing.@model function turing_model()
    # priors
    σ² ~ InverseGamma(2, 3)
    σ = sqrt(σ²)
    m ~ Normal(0, σ)
​
    # observations
    [1.5, 2] ~ Normal(m, σ)
end

72.0 μs

turing_samples

	iteration	chain	σ²	m	lp
	Int64	Int64	Float64	Float64	Float64
1	1	1	3.26046	2.10712	-7.53727
2	2	1	2.26858	1.17912	-6.04583
3	3	1	1.53555	0.139986	-6.17846
4	4	1	1.937	1.9955	-6.17477
5	5	1	1.90338	0.768406	-5.72623
6	6	1	1.05555	1.90944	-5.45533
7	7	1	2.88109	-0.229259	-7.75315
8	8	1	1.14605	0.0912135	-6.24981
9	9	1	0.63478	1.27422	-4.97445
10	10	1	7.00457	1.0926	-9.90325
more

xxxxxxxxxx
 
turing_samples = let
    Random.seed!(100)
    sample(
        turing_model(),
        Turing.Gibbs(Turing.ESS(:m), Turing.MH(:σ²)),
        1_000;
        progress=false,
        thinning=10,
        discard_initial=100,
    )
end

3.7 s

For illustration purposes we chose a model where the posterior is analytically tractable. The following plots visualize the samples obtained with the Gibbs sampler (gray) and their mean (blue). The mean of the posterior distribution is shown in yellow.

xxxxxxxxxx

9.8 μs

xxxxxxxxxx

193 ms

Example: Gaussian process regression

In this example, we consider a Gaussian process regression model, similar to the PyMC3 documentation.

We use a squared exponential kernel with length scale $0.1$ . First, we generate noisy data with the AbstractGPs.jl interface.

xxxxxxxxxx

13.0 μs

xxxxxxxxxx

36.9 ms

GP{AbstractGPs.ZeroMean{Float64}, TransformedKernel{SqExponentialKernel{Euclidean}, ScaleTransform{Int64}}}GP

mean

AbstractGPs.ZeroMean{Float64}ZeroMean

kernel

Squared Exponential Kernel (metric = Euclidean(0.0))
	- Scale Transform (s = 10)

xxxxxxxxxx
 
gp = GP(SqExponentialKernel() ∘ ScaleTransform(10))

1.7 μs

Float64

0.0566454

0.066098

0.109677

0.112582

0.120781

0.171655

0.179574

0.242208

0.250287

0.328475

0.344454

0.368314

0.381813

0.384032

0.420418

0.42957

0.453058

0.497335

0.556684

0.575789

0.596172

0.669931

0.67919

0.72525

0.815104

0.819778

0.844007

0.923354

0.923676

0.999172

xxxxxxxxxx
 
x = let
    Random.seed!(10)
    sort!(rand(30))
end

20.7 μs

gp_x

AbstractGPs.FiniteGP{AbstractGPs.GP{AbstractGPs.ZeroMean{Float64}, KernelFunctions.TransformedKernel{KernelFunctions.SqExponentialKernel{Distances.Euclidean}, KernelFunctions.ScaleTransform{Int64}}}, Vector{Float64}, LinearAlgebra.Diagonal{Float64, FillArrays.Fill{Float64, 1, Tuple{Base.OneTo{Int64}}}}}(
f: GP{AbstractGPs.ZeroMean{Float64}, TransformedKernel{SqExponentialKernel{Euclidean}, ScaleTransform{Int64}}}(AbstractGPs.ZeroMean{Float64}(), Squared Exponential Kernel (metric = Euclidean(0.0))
	- Scale Transform (s = 10))
x: [0.05664544616214151, 0.06609803322813423, 0.1096774963723961, 0.11258244478647295, 0.12078054506961555, 0.17165450700728413, 0.17957407667101322, 0.2422083248151139, 0.2502869659391691, 0.32847482856998655  …  0.5961723206552116, 0.6699313951612162, 0.6791898821149229, 0.7252498905219804, 0.8151038332483567, 0.8197779704008801, 0.8440074795625907, 0.9233537554310114, 0.9236760031564317, 0.9991722180201814]
Σy: [0.1 0.0 … 0.0 0.0; 0.0 0.1 … 0.0 0.0; … ; 0.0 0.0 … 0.1 0.0; 0.0 0.0 … 0.0 0.1]
)

xxxxxxxxxx
 
gp_x = gp(x, 0.1)

58.0 ns

Float64

0.798293

0.584947

0.294374

0.41359

-0.366031

-0.297323

0.0171419

-0.66705

-0.573452

-1.01628

-1.52103

-0.6284

-1.31715

-0.849351

-1.38849

-1.61296

-1.46651

-0.842402

-0.508652

0.00204186

0.425404

0.823966

0.274615

-0.169971

-1.50422

-1.08143

-1.54986

-1.34488

-0.37796

-0.280112

xxxxxxxxxx
 
y = let
    Random.seed!(124)
    rand(gp_x)
end

123 μs

The following plot shows the data and the analytically tractable posterior distribution (mean ± one standard deviation).

xxxxxxxxxx

7.4 μs

xxxxxxxxxx

30.1 ms

We perform elliptical slice sampling of the original data (without noise).

xxxxxxxxxx

10.8 μs

gp_regression_samples

Vector{Float64}

Float64

0.597274

0.640987

0.38739

0.349892

0.236801

-0.424387

-0.486018

-0.579121

-0.57554

-0.494569

Float64

0.526873

0.59127

0.413941

0.380019

0.275812

-0.371438

-0.43671

-0.587121

-0.592312

-0.511464

Float64

0.525019

0.586835

0.396597

0.36174

0.254816

-0.411481

-0.479675

-0.640693

-0.644915

-0.417046

Float64

0.559991

0.612964

0.398285

0.363161

0.256427

-0.390342

-0.455633

-0.637188

-0.650204

-0.293393

Float64

0.555774

0.60452

0.376751

0.341449

0.234826

-0.391844

-0.451692

-0.582678

-0.588834

-0.359923

Float64

0.526794

0.571419

0.350997

0.317264

0.2154

-0.385876

-0.444395

-0.596611

-0.607994

-0.385959

Float64

0.545389

0.594619

0.415708

0.385888

0.295649

-0.230418

-0.279695

-0.412357

-0.428343

-0.252126

Float64

0.551323

0.588444

0.340452

0.305913

0.202782

-0.372687

-0.422986

-0.503174

-0.50767

-0.346758

Float64

0.527218

0.563975

0.32081

0.286976

0.186018

-0.377191

-0.427118

-0.526991

-0.535543

-0.377767

Float64

0.517862

0.558224

0.334143

0.301363

0.203045

-0.356218

-0.407074

-0.515869

-0.525072

-0.440862

Float64

0.54704

0.567692

0.28618

0.252007

0.151476

-0.375687

-0.417234

-0.464401

-0.470255

-0.411807

Float64

0.461314

0.486943

0.238405

0.206667

0.112892

-0.385495

-0.425748

-0.489016

-0.497873

-0.456981

Float64

0.560751

0.585628

0.305663

0.27066

0.167265

-0.390788

-0.43863

-0.53716

-0.547677

-0.280784

Float64

0.592377

0.610072

0.310889

0.275384

0.170978

-0.386177

-0.433555

-0.525521

-0.534093

-0.278515

Float64

0.582341

0.599367

0.303675

0.268826

0.166536

-0.371726

-0.415941

-0.492384

-0.50067

-0.340118

Float64

0.56467

0.580056

0.285463

0.250935

0.149549

-0.387605

-0.432736

-0.521596

-0.531067

-0.333702

Float64

0.561527

0.573335

0.283448

0.250618

0.154886

-0.333886

-0.372895

-0.477096

-0.497002

-0.272777

Float64

0.661976

0.659422

0.324975

0.290592

0.19125

-0.30884

-0.349847

-0.47275

-0.493681

-0.24251

Float64

0.702653

0.692778

0.327667

0.291541

0.187523

-0.334658

-0.378427

-0.520793

-0.543654

-0.143208

Float64

0.688258

0.67617

0.302598

0.265909

0.160249

-0.374732

-0.420756

-0.571072

-0.592323

-0.233102

991

Float64

0.67352

0.592633

0.140639

0.113239

0.0416945

-0.199578

-0.219404

-0.558732

-0.629528

-0.325083

992

Float64

0.633016

0.563819

0.151429

0.125552

0.0575564

-0.17978

-0.200626

-0.55254

-0.624874

-0.305955

993

Float64

0.708363

0.636887

0.188246

0.158851

0.0805127

-0.221622

-0.250703

-0.635125

-0.70797

-0.318931

994

Float64

0.721752

0.644149

0.186226

0.15727

0.0807362

-0.20106

-0.227171

-0.605903

-0.680426

-0.265962

995

Float64

0.62063

0.560026

0.165371

0.139385

0.0704123

-0.181247

-0.203471

-0.552506

-0.625485

-0.218987

996

Float64

0.62015

0.572034

0.210049

0.184068

0.113437

-0.189409

-0.221524

-0.584241

-0.648323

-0.0681036

997

Float64

0.640932

0.585431

0.179967

0.150926

0.0718569

-0.267791

-0.302444

-0.662878

-0.724884

-0.0548768

998

Float64

0.630216

0.573754

0.19167

0.165802

0.0965204

-0.175128

-0.201649

-0.555015

-0.622737

-0.225511

999

Float64

0.657966

0.581829

0.145684

0.118765

0.0482664

-0.186986

-0.204329

-0.501392

-0.565389

-0.289215

1000

Float64

0.710803

0.636205

0.17763

0.147684

0.0679329

-0.22802

-0.252155

-0.565026

-0.628994

-0.298014

xxxxxxxxxx
 
gp_regression_samples = let
    ℓ = let y = y, σ = 0.1
        x -> loglikelihood(MvNormal(x, σ), y)
    end
    Random.seed!(100)
    sample(
        ESSModel(MvNormal(mean_and_cov(gp(x, 1e-12))...), ℓ),
        ESS(),
        1_000;
        progress=false,
        thinning=10,
        discard_initial=100,
    )
end

127 ms

We plot the mean of the posterior distributions based on the samples from elliptical slice sampling.

xxxxxxxxxx

13.4 μs

xxxxxxxxxx

11.2 s

Example: Gaussian process classification

In this example, we consider a Gaussian process classification model, similar to the PyMC3 documentation.

Again, we use a squared exponential kernel with length scale $0.1$ and the same noisy data as above. However, this time we assign a value of $0$ (or false) to all negative values, and a value of $1$ (or true) to all non-negative values.

xxxxxxxxxx

13.6 μs

BitVector: BitVector:

true

true

true

true

false

false

true

false

false

false

false

false

false

false

false

false

false

false

false

true

true

true

true

false

false

false

false

false

false

false

xxxxxxxxxx
 
z = y .>= 0

10.1 μs

xxxxxxxxxx

28.0 ms

We perform elliptical slice sampling to infer the posterior distribution of the original, non-noisy values of the Gaussian process model.

xxxxxxxxxx

7.4 μs

gp_classification_samples

Vector{Float64}

Float64

0.581585

0.665396

0.741815

0.72417

0.657613

-0.216424

-0.392572

-1.46257

-1.52304

-1.69087

Float64

1.10513

1.21871

1.30754

1.28951

1.22537

0.527916

0.392175

-0.7421

-0.883302

-1.04545

Float64

0.93055

0.897569

0.494765

0.460378

0.364027

-0.0280135

-0.0416932

-0.0133922

-0.0411486

-0.250531

Float64

-0.0346569

0.0396385

0.185062

0.180876

0.159975

-0.173014

-0.235864

-0.533272

-0.53414

-1.92608

Float64

0.16922

0.146746

-0.0841717

-0.108513

-0.18354

-0.806717

-0.912289

-1.46859

-1.48778

-1.97649

Float64

1.90763

1.94859

1.946

1.93141

1.879

1.22301

1.08782

0.112288

0.0101288

-1.00413

Float64

0.666621

0.726902

0.750725

0.735922

0.682864

0.00294031

-0.150666

-1.51494

-1.66577

-0.00549039

Float64

0.929646

1.03062

1.17865

1.17378

1.15389

0.94794

0.912957

0.525627

0.453299

-0.623349

Float64

-0.00917508

-0.0519004

-0.41233

-0.439654

-0.51362

-0.7024

-0.677459

-0.342985

-0.342667

-1.38947

Float64

0.327003

0.349835

0.316082

0.304044

0.264147

-0.0690049

-0.113249

-0.240695

-0.246914

-0.979143

Float64

2.28896

2.27652

1.67169

1.60171

1.3881

-0.245973

-0.510316

-2.21932

-2.36771

-2.08935

Float64

0.803055

0.851767

0.670973

0.635855

0.523866

-0.42339

-0.582744

-1.72198

-1.84717

-0.968584

Float64

-0.730414

-0.559771

0.0335847

0.0561166

0.105586

-0.073991

-0.172703

-1.31659

-1.47017

0.249057

Float64

0.818147

0.934519

1.30408

1.31309

1.32489

0.918714

0.785631

-0.591807

-0.766502

0.407505

Float64

0.612372

0.619993

0.452195

0.430689

0.365151

-0.0925106

-0.156333

-0.575109

-0.636255

-2.8208

Float64

-0.524083

-0.412801

0.148094

0.182335

0.272906

0.523099

0.50075

-0.288553

-0.453863

-2.15733

Float64

-0.240331

-0.227984

-0.384526

-0.404408

-0.463881

-0.837383

-0.885824

-1.30594

-1.38974

-1.19177

Float64

0.0821249

-0.0432923

-0.718108

-0.765293

-0.896954

-1.53681

-1.58993

-1.45527

-1.38135

-0.807211

Float64

0.725426

0.611806

-0.0160675

-0.0613016

-0.189321

-0.917789

-1.00778

-1.34548

-1.34481

0.0915087

Float64

0.687162

0.876176

1.22911

1.22158

1.18101

0.448467

0.29131

-0.852967

-0.962398

-2.42404

491

Float64

1.37109

1.44138

1.59988

1.59451

1.56617

0.993521

0.859474

-0.283926

-0.421891

-0.543021

492

Float64

0.0506561

0.086692

0.0789254

0.0606023

-0.00641475

-0.89921

-1.09167

-2.52564

-2.65308

-0.332877

493

Float64

0.683876

0.512196

-0.38124

-0.437989

-0.589908

-1.11219

-1.12388

-1.05389

-1.07793

0.176553

494

Float64

0.478112

0.414649

0.0192395

-0.00903668

-0.0864246

-0.383529

-0.396529

-0.472023

-0.508665

-0.31909

495

Float64

0.825395

0.757136

0.584407

0.583718

0.588026

0.666422

0.659686

0.271786

0.194777

-1.23177

496

Float64

0.360779

0.244788

0.00525913

0.00934563

0.0333545

0.420849

0.483229

0.497165

0.415403

-1.19068

497

Float64

0.669646

0.692948

0.606923

0.588377

0.526964

-0.0834564

-0.191392

-0.70106

-0.705747

-1.73572

498

Float64

1.08765

0.977193

0.786155

0.79087

0.812491

0.995972

0.995449

0.451376

0.327017

-0.891031

499

Float64

0.868043

0.756612

0.360863

0.343164

0.298672

0.102565

0.0643491

-0.499884

-0.600079

-0.507116

500

Float64

2.14794

2.09011

1.57741

1.5337

1.40699

0.597931

0.479522

-0.315341

-0.403577

-1.67284

xxxxxxxxxx
 
gp_classification_samples = let
    ℓ = let z = z
        x -> -sum(log1pexp((1 - 2 * zi) * xi) for (xi, zi) in zip(x, z))
    end
    Random.seed!(100)
    samples = sample(
        ESSModel(MvNormal(mean_and_cov(gp(x, 1e-12))...), ℓ),
        ESS(),
        500;
        progress=false,
        thinning=10,
        discard_initial=100,
    )
end

35.0 ms

We plot the mean of the posterior distributions of the Gaussian process based on the samples from elliptical slice sampling.

xxxxxxxxxx

6.9 μs

xxxxxxxxxx

3.7 s