High probability generalization bounds for uniformly stable algorithms with nearly optimal rate
Algorithmic stability is a classical approach to understanding and analysis of the generalization error of learning algorithms. A notable weakness of most stability-based generalization bounds is that they hold only in expectation. Generalization with high probability has been established in a landmark paper of Bousquet and Elisseeff (2002) albeit at the expense of an additional √(n) factor in the bound. Specifically, their bound on the estimation error of any γ-uniformly stable learning algorithm on n samples and range in [0,1] is O(γ√(n (1/δ)) + √((1/δ)/n)) with probability ≥ 1-δ. The √(n) overhead makes the bound vacuous in the common settings where γ≥ 1/√(n). A stronger bound was recently proved by the authors (Feldman and Vondrak, 2018) that reduces the overhead to at most O(n^1/4). Still, both of these results give optimal generalization bounds only when γ = O(1/n). We prove a nearly tight bound of O(γ(n)(n/δ) + √((1/δ)/n)) on the estimation error of any γ-uniformly stable algorithm. It implies that algorithms that are uniformly stable with γ = O(1/√(n)) have essentially the same estimation error as algorithms that output a fixed function. Our result leads to the first high-probability generalization bounds for multi-pass stochastic gradient descent and regularized ERM for stochastic convex problems with nearly optimal rate --- resolving open problems in prior work. Our proof technique is new and we introduce several analysis tools that might find additional applications.
READ FULL TEXT