Efficiency Comparison & Asymptotic Variance Estimation
Institute of Statistical Mathematics, Tokyo, Japan
6/16/2026



A Blog Entry on Bayesian Computation by an Applied Mathematician
$$
$$



Many PDMPs behave similarly in high dimensions.

discretized by a symplectic integrator
e.g. HMC has O(d^{1.25}) complexity

simulate piecewise linear trajectory
e.g. BPS with normal velocity has O(d^{1.5}) complexity

O(d^1) Local Implementation
Exploiting sparsity, BPS atteins better scaling than HMC (MSE/sec.)
Stochastic Gradient O(n^0)
x: sample size n, y: log ESS
Logistic regression with d=16
Using an appropriate control variate, the efficiency of Zig-Zag is O(1)
in the limit of n\to\infty, it outperforms Langevin.
We compare two famous PDMP methods under the following condition:






Effective Sample Size of U(x)=\|x\|^2, the negative log-density of 100 dimensional standard Gaussian, estimated over 1000 runs
\text{Plotting } \textcolor{#0096FF}{Y_t^{(d)}}=\frac{\lvert\textcolor{#0096FF}{X}_{d\textcolor{#0096FF}{t}}^{\textcolor{#0096FF}{(d)}}\rvert^2-d}{\sqrt{d}} \text{ with } d=10^2,10^3,10^4:
dY_t^{\textcolor{#0096FF}{\text{B}}}=-\frac{\sigma^2_{\textcolor{#0096FF}{\text{B}}}(\rho)}{4}Y_t^{\textcolor{#0096FF}{\text{B}}}\,dt+\sigma_{\textcolor{#0096FF}{\text{B}}}(\rho)\,dB_t \sigma^2_{\textcolor{#0096FF}{\text{B}}}(\rho)=8\int^\infty_0e^{-\rho s}\operatorname{E}[R_0^{\textcolor{#0096FF}{\text{B}}}R_s^{\textcolor{#0096FF}{\text{B}}}]\,ds
dY_t^{\textcolor{#E95420}{\text{F}}}=-\frac{\sigma^2_{\textcolor{#E95420}{\text{F}}}(\rho)}{4}Y_t^{\textcolor{#E95420}{\text{F}}}\,dt+\sigma_{\textcolor{#E95420}{\text{F}}}(\rho)\,dB_t \sigma^2_{\textcolor{#E95420}{\text{F}}}(\rho)=8\int^\infty_0e^{-\rho s}\operatorname{E}[R_0^{\textcolor{#E95420}{\text{F}}}R_s^{\textcolor{#E95420}{\text{F}}}]\,ds
While BPS atteins maximum at non-trivial value of \rho, FECMC achieves maximum at \rho=0
Doob-Meyer Decomposition (Doléans-Dade and Meyer, 1970 Theorem 4)
Y^d is non-Markovian, but is a (special) semimartingale, having Y^d_t=Y^d_0+B^d_t+M^d_t, where B^d is the dual predictable projection of Y^d, and M^d is a local L^2-martingale.
Convergence of Semimartingales (Jacod and Shiryaev, 2003 Theorem IX.3.48)
The convergence of B^d and C^d_t:=\langle M^d\rangle_t (in a suitable sense) to B_t=-\int^t_0\frac{\sigma^2}{4}Y_s\,ds,\quad C_t=\sigma^2t, is one of the sufficient conditions of the weak convergence Y^d\Rightarrow Y when dY_t=-\frac{\sigma^2}{4}Y_t\,dt+\sigma\,dB_t.
Original Potential Process \textcolor{#0096FF}{Y^d}
B^d_t=2\sqrt{d}\int^t_0R^d_{ds-}\,ds,%\quad R_t^d:=\brac{X_t^d,V_t^d}, C^d_t=0.
Difficult to see how C_t=\sigma^2t emerges.
Event-Time Skeleton \textcolor{#0096FF}{\overline{Y}^d}
\overline{B}^d_t=\sum_{n=1}^{N_t}\operatorname{E}[\Delta\overline{Y}^d_n|\overline{\mathcal{F}}^d_{n-1}] {\scriptsize \overline{C}^d_t=\sum_{n=1}^{N_t}\left(\operatorname{E}[(\Delta\overline{Y}^d_n)^2|\overline{\mathcal{F}}^d_{n-1}]-\operatorname{E}[\Delta\overline{Y}^d_n|\overline{\mathcal{F}}^d_{n-1}]^2\right).}
There is a fundamental difference between the Monte Carlo estimators produced by
\text{PDMP}\quad\widehat{h}_T^{\textcolor{#E95420}{\text{PDMP}}}=\frac{1}{T}\int^T_0h(\textcolor{#E95420}{X}_{\textcolor{#E95420}{t}}^{\textcolor{#E95420}{(d)}})\,dt,\quad t\in[0,T], \text{classical MCMC}\quad\widehat{h}_N^{\textcolor{#0096FF}{\text{MCMC}}}=\frac{1}{N}\sum_{n=1}^Nh(\textcolor{#0096FF}{X_n^{(d)}}),\quad n=1,\cdots,N.
Exploiting this continuous-time nature of PDMP, we can introduce more efficient estimators for the asymptotic variance of \widehat{h}_T^{\textcolor{#E95420}{\text{PDMP}}}.
Corollary (Relationship between \sigma^2 and ESS)
The effective sample size (ESS) for estimating U with a trajectory of length \textcolor{#E95420}{d}T is given by \operatorname{ESS}(U)\fallingdotseq\frac{\sigma^2}{8}T\qquad(d\to\infty).
[Proof] The Monte Carlo estimator (= ergodic average of \textcolor{#E95420}{\{X_t\}}) \widehat{h}_T^d:=\frac{1}{T}\int^T_0U(\textcolor{#E95420}{X_t^{(d)}})\,dt has a computable variance under double asymptotic limit: \lim_{T\to\infty}\lim_{d\to\infty}T\frac{\operatorname{Var}[\widehat{h}_{\textcolor{#E95420}{d}T}^d]}{\operatorname{Var}_\pi[U]}=\frac{8}{\sigma^2}\approx2.50\cdots. ESS equals the inverse of this variance ratio.
ESS for U estimated over 1000 runs. Every bootstrap CI contains the theoretical limiting value {8}/{\sigma^2}.
A practical issue: Does this trajectory suffice for mean estimation? → A solution: asymptotic variance estimation
\text{Markov Chain CLT: }\quad\frac{1}{\sqrt{N}}\sum_{n=1}^NX_n\Rightarrow N(0,\sigma^2_{\text{asym}})\quad(N\to\infty). For mini-batches b=1,\cdots,B with length m:=N/B, \operatorname*{{Batch\; Means\; Estimator:}}_{\text{(for mean estimation)}} \text{ }\quad\widehat{\sigma^2}_\text{asym}:=\frac{m}{B-1}\sum_{b=1}^B\left(\textcolor{#0096FF}{\overline{{X}}^{(b)}}-\textcolor{#0096FF}{\overline{{X}}}\right)^2, where \textcolor{#0096FF}{\overline{{X}}^{(b)}}=\frac{1}{m}\sum_{n=1}^m\textcolor{#0096FF}{X_{n+(b-1)m}} is the mean over the b-th batch.
Due to the continuous-time nature, batch size selection becomes a nontrivial problem for PDMPs.
MSE Minimizing Batch Size for Classical MCMC (Liu et al., 2022)
m_{\text{opt}}=\left(\frac{\gamma_{\text{disc}}}{\sigma_{\text{asym}}^2}\right)^{\frac{2}{3}}N^{\frac{1}{3}} minimizes the MSE asymptotically under m\to\infty,\qquad N/m\to\infty, for a sufficiently mixing Markov chain \{\textcolor{#0096FF}{X_n}\}.
Bias Minimizing Batch Size for PDMP (S. & Kamatani 2026+)
Analogously, with \gamma_{\text{disc}}=2\sum_{k=0}^\infty k\operatorname{Cov}[\textcolor{#0096FF}{X_0},\textcolor{#0096FF}{X_k}] replaced by \gamma_{\text{cont}}=2\int^\infty_0t\operatorname{Cov}[\textcolor{#E95420}{X_0},\textcolor{#E95420}{X_t}]\,dt.
= Quadratic Variation (QV) Estimation under Misspecification
Plot of Y_t^d:=(|\textcolor{#E95420}{X^d}_{d\textcolor{#E95420}{t}}|^2-d)/\sqrt{d} when d=10^4. The trajectory converges to dY_t=-\frac{\sigma^2}{4} Y_t\,dt+\sigma\,dB_t as d\to\infty
The Realized Volatility estimator (Barndorff-Nielsen and Shephard, 2002) \widehat{Q}^d:=\frac{1}{T}\sum_{n=1}^N\biggr(Y_{\Delta n}^d-Y_{\Delta(n-1)}^d\biggl)^2,\quad T=\Delta N, converges to \sigma^2 under the conditions \Delta\to0,\quad d\to\infty,\quad\Delta d\to\infty.
slow corresponds to the Batch Means estimator, fast corresponds to the Realized Variation estimator
slow corresponds to the Batch Means estimator, fast corresponds to the Realized Variation estimator
\fallingdotseq selecting the sampling step size \Delta under Misspecification (Aït-Sahalia et al., 2011)
For too small / large \Delta, \widehat{Q}^{d}(\Delta)\approx0. The optimal choice seems to be \Delta=O(d^{1/2}).