High-Dimensional Behaviour of Piecewise Deterministic Monte Carlo

Efficiency Comparison & Asymptotic Variance Estimation

Hirofumi Shiba

Institute of Statistical Mathematics, Tokyo, Japan

6/16/2026

Today’s Contents

  • Section 1 What is PDMP?

    → A new class of continuous-time Monte Carlo methods

  • Section 2 High-Dimensional Scaling Analysis

    FECMC always performs better than BPS

  • Section 3 Asymptotic Variance / ESS Estimation

    PDMP algorithms allow a more efficient estimator

1 PDMPs: A New Frontier of Monte Carlo

Markov Chain
(1953–)

Langevin Diffusion
(1978–)

PDMP
(2008–)

1.1 PDMP: Mathematical Definition

Langevin Diffusion

Randomized Hamiltonian Monte Carlo

1.2 Markov Chain Monte Carlo

1.3 Classical Methods Turn Out to Be PDMPs in Disguise

Many PDMPs behave similarly in high dimensions.

RHMC (d=2)

discretized by a symplectic integrator

e.g. HMC has O(d^{1.25}) complexity

BPS with Gaussian speed (d=10^3)

simulate piecewise linear trajectory

e.g. BPS with normal velocity has O(d^{1.5}) complexity

1.4 Digression: Killer Applications of PDMP

x: CPU time, y: Estimate.(Bouchard-Côté et al., 2018)
Sparse Markov Random Field with d=10

O(d^1) Local Implementation

Exploiting sparsity, BPS atteins better scaling than HMC (MSE/sec.)

Stochastic Gradient O(n^0)

x: sample size n, y: log ESS
Logistic regression with d=16
Using an appropriate control variate, the efficiency of Zig-Zag is O(1)
in the limit of n\to\infty, it outperforms Langevin.

2 Towards Better Jump Strategy: A Scaling Analysis

We compare two famous PDMP methods under the following condition:

  • ODE: Fixed
  • Jump: Reflection + Refreshment vs. Combination
  • Target: High dimensional standard Gaussian U(x):=-\log\pi(x)=\|x\|^2/2.

2.1 BPS vs. FECMC

Bouncy Particle Sampler (Bouchard-Côté et al., 2018)

BPS with different hyperparameter \rho

Forward Event Chain Monte Carlo (Michel et al., 2020)

2.2 Jump Strategy in BPS vs. FECMC

Reflection

Refreshment \rho

Stochastic Reflection

2.3 Empirical Comparison: BPS vs. FECMC

Effective Sample Size of U(x)=\|x\|^2, the negative log-density of 100 dimensional standard Gaussian, estimated over 1000 runs

2.4 Scaling Analysis = Deriving a Diffusion Limit

\text{Plotting } \textcolor{#0096FF}{Y_t^{(d)}}=\frac{\lvert\textcolor{#0096FF}{X}_{d\textcolor{#0096FF}{t}}^{\textcolor{#0096FF}{(d)}}\rvert^2-d}{\sqrt{d}} \text{ with } d=10^2,10^3,10^4:

2.5 Theorem 1: Diffusion Limits of FECMC & BPS

dY_t^{\textcolor{#0096FF}{\text{B}}}=-\frac{\sigma^2_{\textcolor{#0096FF}{\text{B}}}(\rho)}{4}Y_t^{\textcolor{#0096FF}{\text{B}}}\,dt+\sigma_{\textcolor{#0096FF}{\text{B}}}(\rho)\,dB_t \sigma^2_{\textcolor{#0096FF}{\text{B}}}(\rho)=8\int^\infty_0e^{-\rho s}\operatorname{E}[R_0^{\textcolor{#0096FF}{\text{B}}}R_s^{\textcolor{#0096FF}{\text{B}}}]\,ds

dY_t^{\textcolor{#E95420}{\text{F}}}=-\frac{\sigma^2_{\textcolor{#E95420}{\text{F}}}(\rho)}{4}Y_t^{\textcolor{#E95420}{\text{F}}}\,dt+\sigma_{\textcolor{#E95420}{\text{F}}}(\rho)\,dB_t \sigma^2_{\textcolor{#E95420}{\text{F}}}(\rho)=8\int^\infty_0e^{-\rho s}\operatorname{E}[R_0^{\textcolor{#E95420}{\text{F}}}R_s^{\textcolor{#E95420}{\text{F}}}]\,ds

2.6 Theorem 2: Analytic Expression of \sigma^2’s

While BPS atteins maximum at non-trivial value of \rho, FECMC achieves maximum at \rho=0

2.7 Sketch of Proof: the non-Markovianity of Y^d

Doob-Meyer Decomposition (Doléans-Dade and Meyer, 1970 Theorem 4)

Y^d is non-Markovian, but is a (special) semimartingale, having Y^d_t=Y^d_0+B^d_t+M^d_t, where B^d is the dual predictable projection of Y^d, and M^d is a local L^2-martingale.

Convergence of Semimartingales (Jacod and Shiryaev, 2003 Theorem IX.3.48)

The convergence of B^d and C^d_t:=\langle M^d\rangle_t (in a suitable sense) to B_t=-\int^t_0\frac{\sigma^2}{4}Y_s\,ds,\quad C_t=\sigma^2t, is one of the sufficient conditions of the weak convergence Y^d\Rightarrow Y when dY_t=-\frac{\sigma^2}{4}Y_t\,dt+\sigma\,dB_t.

2.8 Asymptotic Equivalence to a Skeleton Process

Original Potential Process \textcolor{#0096FF}{Y^d}

B^d_t=2\sqrt{d}\int^t_0R^d_{ds-}\,ds,%\quad R_t^d:=\brac{X_t^d,V_t^d}, C^d_t=0.

Difficult to see how C_t=\sigma^2t emerges.

Event-Time Skeleton \textcolor{#0096FF}{\overline{Y}^d}

\overline{B}^d_t=\sum_{n=1}^{N_t}\operatorname{E}[\Delta\overline{Y}^d_n|\overline{\mathcal{F}}^d_{n-1}] {\scriptsize \overline{C}^d_t=\sum_{n=1}^{N_t}\left(\operatorname{E}[(\Delta\overline{Y}^d_n)^2|\overline{\mathcal{F}}^d_{n-1}]-\operatorname{E}[\Delta\overline{Y}^d_n|\overline{\mathcal{F}}^d_{n-1}]^2\right).}

3 Asymptotic Variance / ESS Estimation for PDMP (or continuous-time MCMC more broadly)

There is a fundamental difference between the Monte Carlo estimators produced by

\text{PDMP}\quad\widehat{h}_T^{\textcolor{#E95420}{\text{PDMP}}}=\frac{1}{T}\int^T_0h(\textcolor{#E95420}{X}_{\textcolor{#E95420}{t}}^{\textcolor{#E95420}{(d)}})\,dt,\quad t\in[0,T], \text{classical MCMC}\quad\widehat{h}_N^{\textcolor{#0096FF}{\text{MCMC}}}=\frac{1}{N}\sum_{n=1}^Nh(\textcolor{#0096FF}{X_n^{(d)}}),\quad n=1,\cdots,N.

Exploiting this continuous-time nature of PDMP, we can introduce more efficient estimators for the asymptotic variance of \widehat{h}_T^{\textcolor{#E95420}{\text{PDMP}}}.

3.1 Implications of the Diffusion Limit Results

Corollary (Relationship between \sigma^2 and ESS)

The effective sample size (ESS) for estimating U with a trajectory of length \textcolor{#E95420}{d}T is given by \operatorname{ESS}(U)\fallingdotseq\frac{\sigma^2}{8}T\qquad(d\to\infty).

[Proof] The Monte Carlo estimator (= ergodic average of \textcolor{#E95420}{\{X_t\}}) \widehat{h}_T^d:=\frac{1}{T}\int^T_0U(\textcolor{#E95420}{X_t^{(d)}})\,dt has a computable variance under double asymptotic limit: \lim_{T\to\infty}\lim_{d\to\infty}T\frac{\operatorname{Var}[\widehat{h}_{\textcolor{#E95420}{d}T}^d]}{\operatorname{Var}_\pi[U]}=\frac{8}{\sigma^2}\approx2.50\cdots. ESS equals the inverse of this variance ratio.

3.2 Theory Matches Practice: FECMC vs. BPS

ESS for U estimated over 1000 runs. Every bootstrap CI contains the theoretical limiting value {8}/{\sigma^2}.

3.3 Asymptotic Variance Estimation for MCMC

A practical issue: Does this trajectory suffice for mean estimation? → A solution: asymptotic variance estimation

\text{Markov Chain CLT: }\quad\frac{1}{\sqrt{N}}\sum_{n=1}^NX_n\Rightarrow N(0,\sigma^2_{\text{asym}})\quad(N\to\infty). For mini-batches b=1,\cdots,B with length m:=N/B, \operatorname*{{Batch\; Means\; Estimator:}}_{\text{(for mean estimation)}} \text{ }\quad\widehat{\sigma^2}_\text{asym}:=\frac{m}{B-1}\sum_{b=1}^B\left(\textcolor{#0096FF}{\overline{{X}}^{(b)}}-\textcolor{#0096FF}{\overline{{X}}}\right)^2, where \textcolor{#0096FF}{\overline{{X}}^{(b)}}=\frac{1}{m}\sum_{n=1}^m\textcolor{#0096FF}{X_{n+(b-1)m}} is the mean over the b-th batch.

3.4 Optimal Batch Size for Classical MCMC & PDMP

Due to the continuous-time nature, batch size selection becomes a nontrivial problem for PDMPs.

MSE Minimizing Batch Size for Classical MCMC (Liu et al., 2022)

m_{\text{opt}}=\left(\frac{\gamma_{\text{disc}}}{\sigma_{\text{asym}}^2}\right)^{\frac{2}{3}}N^{\frac{1}{3}} minimizes the MSE asymptotically under m\to\infty,\qquad N/m\to\infty, for a sufficiently mixing Markov chain \{\textcolor{#0096FF}{X_n}\}.

Bias Minimizing Batch Size for PDMP (S. & Kamatani 2026+)

Analogously, with \gamma_{\text{disc}}=2\sum_{k=0}^\infty k\operatorname{Cov}[\textcolor{#0096FF}{X_0},\textcolor{#0096FF}{X_k}] replaced by \gamma_{\text{cont}}=2\int^\infty_0t\operatorname{Cov}[\textcolor{#E95420}{X_0},\textcolor{#E95420}{X_t}]\,dt.

3.5 Diffusivity Estimation for High-dimensional PDMPs

= Quadratic Variation (QV) Estimation under Misspecification

Plot of Y_t^d:=(|\textcolor{#E95420}{X^d}_{d\textcolor{#E95420}{t}}|^2-d)/\sqrt{d} when d=10^4. The trajectory converges to dY_t=-\frac{\sigma^2}{4} Y_t\,dt+\sigma\,dB_t as d\to\infty

The Realized Volatility estimator (Barndorff-Nielsen and Shephard, 2002) \widehat{Q}^d:=\frac{1}{T}\sum_{n=1}^N\biggr(Y_{\Delta n}^d-Y_{\Delta(n-1)}^d\biggl)^2,\quad T=\Delta N, converges to \sigma^2 under the conditions \Delta\to0,\quad d\to\infty,\quad\Delta d\to\infty.

3.6 Experiment: QV Estimation Improves with Dimension

slow corresponds to the Batch Means estimator, fast corresponds to the Realized Variation estimator

3.6 Experiment: QV Estimation Improves with Dimension

slow corresponds to the Batch Means estimator, fast corresponds to the Realized Variation estimator

3.7 Batch Size Selection for the QV Estimator

\fallingdotseq selecting the sampling step size \Delta under Misspecification (Aït-Sahalia et al., 2011)

Plot of Y_t^d:=(|\textcolor{#E95420}{X^d}_{d\textcolor{#E95420}{t}}|^2-d)/\sqrt{d} when d=10^4. Taking a closer look, the trajectory is an ODE, a `microstructure noise’

For too small / large \Delta, \widehat{Q}^{d}(\Delta)\approx0. The optimal choice seems to be \Delta=O(d^{1/2}).

Conclusion

  1. Reducing redundant noise produces a more efficient algorithm: BPSFECMC
  2. Continuous-time MCMC allows more efficient estimators for asymptotic variance
    • potentially works whenever a diffusion limit exists

References

Aït-Sahalia, Y., Mykland, P. A., and Zhang, L. (2011). Ultra High Frequency Volatility Estimation with Dependent Microstructure Noise. Journal of Econometrics, 160(1), 160–175.
Barndorff-Nielsen, O. E., and Shephard, N. (2002). Estimating Quadratic Variation using Realized Variance. Journal of Applied Econometrics, 17(5), 457–477.
Bierkens, J., Fearnhead, P., and Roberts, G. (2019). The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data. The Annals of Statistics, 47(3), 1288–1320.
Bouchard-Côté, A., Vollmer, S. J., and Doucet, A. (2018). The Bouncy Particle Sampler: A Nonreversible Rejection-Free Markov Chain Monte Carlo Method. Journal of the American Statistical Association, 113(522), 855–867.
Doléans-Dade, C., and Meyer, P.-A. (1970). Intégrales Stochastiques par Rapport aux Martingales Locales. Séminaire de Probabilités, 4, 77–107.
Jacod, J., and Shiryaev, A. N. (2003). Limit Theorems for Stochastic Processes,Vol. 288. Springer Berlin, Heidelberg.
Liu, Y., Vats, D., and Flegal, J. M. (2022). Batch Size Selection for Variance Estimators in MCMC. Methodology and Computing in Applied Probability, 24(1), 65–93.
Michel, M., Durmus, A., and Sénécal, S. (2020). Forward Event-Chain Monte Carlo: Fast Sampling by Randomness Control in Irreversible Markov Chains. Journal of Computational and Graphical Statistics, 29(4), 689–702.