Near-Optimal Coresets of Kernel Density Estimates
We construct near-optimal coresets for kernel density estimate for points in R^d when the kernel is positive definite. Specifically we show a polynomial time construction for a coreset of size O(√(d (1/ϵ))/ϵ), and we show a near-matching lower bound of size Ω(√(d)/ϵ). The upper bound is a polynomial in 1/ϵ improvement when d ∈ [3,1/ϵ^2) (for all kernels except the Gaussian kernel which had a previous upper bound of O((1/ϵ) ^d (1/ϵ))) and the lower bound is the first known lower bound to depend on d for this problem. Moreover, the upper bound restriction that the kernel is positive definite is significant in that it applies to a wide-variety of kernels, specifically those most important for machine learning. This includes kernels for information distances and the sinc kernel which can be negative.
READ FULL TEXT