Fast Best Subset Selection: Coordinate Descent and Local Combinatorial Optimization Algorithms

03/05/2018
by   Hussein Hazimeh, et al.
1

We consider the canonical L_0-regularized least squares problem (aka best subsets) which is generally perceived as a `gold-standard' for many sparse learning regimes. In spite of worst-case computational intractability results, recent work has shown that advances in mixed integer optimization can be used to obtain near-optimal solutions to this problem for instances where the number of features p ≈ 10^3. While these methods lead to estimators with excellent statistical properties, often there is a price to pay in terms of a steep increase in computation times, especially when compared to highly efficient popular algorithms for sparse learning (e.g., based on L_1-regularization) that scale to much larger problem sizes. Bridging this gap is a main goal of this paper. We study the computational aspects of a family of L_0-regularized least squares problems with additional convex penalties. We propose a hierarchy of necessary optimality conditions for these problems. We develop new algorithms, based on coordinate descent and local combinatorial optimization schemes, and study their convergence properties. We demonstrate that the choice of an algorithm determines the quality of solutions obtained; and local combinatorial optimization-based algorithms generally result in solutions of superior quality. We show empirically that our proposed framework is relatively fast for problem instances with p≈ 10^6 and works well, in terms of both optimization and statistical properties (e.g., prediction, estimation, and variable selection), compared to simpler heuristic algorithms. A version of our algorithm reaches up to a three-fold speedup (with p up to 10^6) when compared to state-of-the-art schemes for sparse learning such as glmnet and ncvreg.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset