Solving Empirical Risk Minimization in the Current Matrix Multiplication Time
Many convex problems in machine learning and computer science share the same form: _x∑_i f_i( A_i x + b_i), where f_i are convex functions on R^n_i with constant n_i, A_i ∈R^n_i × d, b_i ∈R^n_i and ∑_i n_i = n. This problem generalizes linear programming and includes many problems in empirical risk minimization. In this paper, we give an algorithm that runs in time O^* ( ( n^ω + n^2.5 - α/2 + n^2+ 1/6 ) (n / δ) ) where ω is the exponent of matrix multiplication, α is the dual exponent of matrix multiplication, and δ is the relative accuracy. Note that the runtime has only a log dependence on the condition numbers or other data dependent parameters and these are captured in δ. For the current bound ω∼ 2.38 [Vassilevska Williams'12, Le Gall'14] and α∼ 0.31 [Le Gall, Urrutia'18], our runtime O^* ( n^ω (n / δ)) matches the current best for solving a dense least squares regression problem, a special case of the problem we consider. Very recently, [Alman'18] proved that all the current known techniques can not give a better ω below 2.168 which is larger than our 2+1/6. Our result generalizes the very recent result of solving linear programs in the current matrix multiplication time [Cohen, Lee, Song'19] to a more broad class of problems. Our algorithm proposes two concepts which are different from [Cohen, Lee, Song'19] : ∙ We give a robust deterministic central path method, whereas the previous one is a stochastic central path which updates weights by a random sparse vector. ∙ We propose an efficient data-structure to maintain the central path of interior point methods even when the weights update vector is dense.
READ FULL TEXT