A closer look at the approximation capabilities of neural networks
The universal approximation theorem, in one of its most general versions, says that if we consider only continuous activation functions σ, then a standard feedforward neural network with one hidden layer is able to approximate any continuous multivariate function f to any given approximation threshold ε, if and only if σ is non-polynomial. In this paper, we give a direct algebraic proof of the theorem. Furthermore we shall explicitly quantify the number of hidden units required for approximation. Specifically, if X⊆R^n is compact, then a neural network with n input units, m output units, and a single hidden layer with n+dd hidden units (independent of m and ε), can uniformly approximate any polynomial function f:X →R^m whose total degree is at most d for each of its m coordinate functions. In the general case that f is any continuous function, we show there exists some N∈O(ε^-n) (independent of m), such that N hidden units would suffice to approximate f. We also show that this uniform approximation property (UAP) still holds even under seemingly strong conditions imposed on the weights. We highlight several consequences: (i) For any δ > 0, the UAP still holds if we restrict all non-bias weights w in the last layer to satisfy |w| < δ. (ii) There exists some λ>0 (depending only on f and σ), such that the UAP still holds if we restrict all non-bias weights w in the first layer to satisfy |w|>λ. (iii) If the non-bias weights in the first layer are fixed and randomly chosen from a suitable range, then the UAP holds with probability 1.
READ FULL TEXT