Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back
In stochastic convex optimization the goal is to minimize a convex function F(x) E_ f∼ D[ f(x)] over a convex set K ⊂ R^d where D is some unknown distribution and each f(·) in the support of D is convex over K. The optimization is commonly based on i.i.d. samples f^1,f^2,...,f^n from D. A standard approach to such problems is empirical risk minimization (ERM) that optimizes F_S(x) 1/n∑_i≤ n f^i(x). Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of F_S to F over K. We demonstrate that in the standard ℓ_p/ℓ_q setting of Lipschitz-bounded functions over a K of bounded radius, ERM requires sample size that scales linearly with the dimension d. This nearly matches standard upper bounds and improves on Ω( d) dependence proved for ℓ_2/ℓ_2 setting by Shalev-Shwartz et al. (2009). In stark contrast, these problems can be solved using dimension-independent number of samples for ℓ_2/ℓ_2 setting and d dependence for ℓ_1/ℓ_∞ setting using other approaches. We further show that our lower bound applies even if the functions in the support of D are smooth and efficiently computable and even if an ℓ_1 regularization term is added. Finally, we demonstrate that for a more general class of bounded-range (but not Lipschitz-bounded) stochastic convex programs an infinite gap appears already in dimension 2.
READ FULL TEXT