Statistically efficient thinning of a Markov chain sampler
It is common to subsample Markov chain output to reduce the storage burden. Geyer (1992) shows that discarding k-1 out of every k observations will not improve statistical efficiency, as quantified through variance in a given computational budget. That observation is often taken to mean that thinning MCMC output cannot improve statistical efficiency. Here we suppose that it costs one unit of time to advance a Markov chain and then θ>0 units of time to compute a sampled quantity of interest. For a thinned process, that cost θ is incurred less often, so it can be advanced through more stages. Here we provide examples to show that thinning will improve statistical efficiency if θ is large and the sample autocorrelations decay slowly enough. If the lag ℓ>1 autocorrelations of a scalar measurement satisfy ρ_ℓ>ρ_ℓ+1>0, then there is always a θ<∞ at which thinning becomes more efficient for averages of that scalar. Many sample autocorrelation functions resemble first order AR(1) processes with ρ_ℓ =ρ^|ℓ| for some -1<ρ<1. For an AR(1) process it is possible to compute the most efficient subsampling frequency k. The optimal k grows rapidly as ρ increases towards 1. The resulting efficiency gain depends primarily on θ, not ρ. Taking k=1 (no thinning) is optimal when ρ<0. For ρ>0 it is optimal if and only if θ< (1-ρ)^2/(2ρ). This efficiency gain never exceeds 1+θ. This paper also gives efficiency bounds for autocorrelations bounded between those of two AR(1) processes.
READ FULL TEXT