Equispaced Fourier representations for efficient Gaussian process regression from a billion data points
We introduce a Fourier-based fast algorithm for Gaussian process regression. It approximates a translationally-invariant covariance kernel by complex exponentials on an equispaced Cartesian frequency grid of M nodes. This results in a weight-space M× M system matrix with Toeplitz structure, which can thus be applied to a vector in 𝒪(M logM) operations via the fast Fourier transform (FFT), independent of the number of data points N. The linear system can be set up in 𝒪(N + M logM) operations using nonuniform FFTs. This enables efficient massive-scale regression via an iterative solver, even for kernels with fat-tailed spectral densities (large M). We include a rigorous error analysis of the kernel approximation, the resulting accuracy (relative to "exact" GP regression), and the condition number. Numerical experiments for squared-exponential and Matérn kernels in one, two and three dimensions often show 1-2 orders of magnitude acceleration over state-of-the-art rank-structured solvers at comparable accuracy. Our method allows 2D Matérn-3/2 regression from N=10^9 data points to be performed in 2 minutes on a standard desktop, with posterior mean accuracy 10^-3. This opens up spatial statistics applications 100 times larger than previously possible.
READ FULL TEXT