Tractability from overparametrization: The example of the negative perceptron
In the negative perceptron problem we are given n data points ( x_i,y_i), where x_i is a d-dimensional vector and y_i∈{+1,-1} is a binary label. The data are not linearly separable and hence we content ourselves to find a linear classifier with the largest possible negative margin. In other words, we want to find a unit norm vector θ that maximizes min_i≤ ny_i⟨θ, x_i⟩. This is a non-convex optimization problem (it is equivalent to finding a maximum norm vector in a polytope), and we study its typical properties under two random models for the data. We consider the proportional asymptotics in which n,d→∞ with n/d→δ, and prove upper and lower bounds on the maximum margin κ_s(δ) or – equivalently – on its inverse function δ_s(κ). In other words, δ_s(κ) is the overparametrization threshold: for n/d≤δ_s(κ)-ε a classifier achieving vanishing training error exists with high probability, while for n/d≥δ_s(κ)+ε it does not. Our bounds on δ_s(κ) match to the leading order as κ→ -∞. We then analyze a linear programming algorithm to find a solution, and characterize the corresponding threshold δ_lin(κ). We observe a gap between the interpolation threshold δ_s(κ) and the linear programming threshold δ_lin(κ), raising the question of the behavior of other algorithms.
READ FULL TEXT