Analysis of the rate of convergence of an over-parametrized deep neural network estimate learned by gradient descent

10/04/2022
by   Michael Kohler, et al.
0

Estimation of a regression function from independent and identically distributed random variables is considered. The L_2 error with integration with respect to the design measure is used as an error criterion. Over-parametrized deep neural network estimates are defined where all the weights are learned by the gradient descent. It is shown that the expected L_2 error of these estimates converges to zero with the rate close to n^-1/(1+d) in case that the regression function is Hölder smooth with Hölder exponent p ∈ [1/2,1]. In case of an interaction model where the regression function is assumed to be a sum of Hölder smooth functions where each of the functions depends only on d^* many of d components of the design variable, it is shown that these estimates achieve the corresponding d^*-dimensional rate of convergence.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset