Optimal Rate of Kernel Regression in Large Dimensions
We perform a study on kernel regression for large-dimensional data (where the sample size n is polynomially depending on the dimension d of the samples, i.e., n≍ d^γ for some γ >0 ). We first build a general tool to characterize the upper bound and the minimax lower bound of kernel regression for large dimensional data through the Mendelson complexity ε_n^2 and the metric entropy ε̅_n^2 respectively. When the target function falls into the RKHS associated with a (general) inner product model defined on 𝕊^d, we utilize the new tool to show that the minimax rate of the excess risk of kernel regression is n^-1/2 when n≍ d^γ for γ =2, 4, 6, 8, ⋯. We then further determine the optimal rate of the excess risk of kernel regression for all the γ>0 and find that the curve of optimal rate varying along γ exhibits several new phenomena including the multiple descent behavior and the periodic plateau behavior. As an application, For the neural tangent kernel (NTK), we also provide a similar explicit description of the curve of optimal rate. As a direct corollary, we know these claims hold for wide neural networks as well.
READ FULL TEXT