Learning Sparse Classifiers: Continuous and Mixed Integer Optimization Perspectives

01/17/2020
by   Antoine Dedieu, et al.
8

We consider a discrete optimization based approach for learning sparse classifiers, where the outcome depends upon a linear combination of a small subset of features. Recent work has shown that mixed integer programming (MIP) can be used to solve (to optimality) ℓ_0-regularized problems at scales much larger than what was conventionally considered possible in the statistics and machine learning communities. Despite their usefulness, MIP-based approaches are significantly slower compared to relatively mature algorithms based on ℓ_1-regularization and relatives. We aim to bridge this computational gap by developing new MIP-based algorithms for ℓ_0-regularized classification. We propose two classes of scalable algorithms: an exact algorithm that can handle p≈ 50,000 features in a few minutes, and approximate algorithms that can address instances with p≈ 10^6 in times comparable to fast ℓ_1-based algorithms. Our exact algorithm is based on the novel idea of integrality generation, which solves the original problem (with p binary variables) via a sequence of mixed integer programs that involve a small number of binary variables. Our approximate algorithms are based on coordinate descent and local combinatorial search. In addition, we present new estimation error bounds for a class of ℓ_0-regularized estimators. Experiments on real and synthetic data demonstrate that our approach leads to models with considerably improved statistical performance (especially, variable selection) when compared to competing toolkits.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset