Learning Hierarchical Interactions at Scale: A Convex Optimization Approach

02/05/2019
by   Hussein Hazimeh, et al.
1

In many learning settings, it is beneficial to augment the main features with pairwise interactions. Such interaction models can be often enhanced by performing variable selection under the so-called strong hierarchy constraint: an interaction is non-zero only if its associated main features are non-zero. Existing convex optimization based algorithms face difficulties in handling problems where the number of main features p ∼ 10^3 (with total number of features ∼ p^2). In this paper, we study a convex relaxation which enforces strong hierarchy and develop a scalable algorithm for solving it. Our proposed algorithm employs a proximal gradient method along with a novel active-set strategy, specialized screening rules, and decomposition rules towards verifying optimality conditions. Our framework can handle problems having dense design matrices, with p = 50,000 (∼ 10^9 interactions)---instances that are much larger than current state of the art. Experiments on real and synthetic data suggest that our toolkit hierScale outperforms the state of the art in terms of prediction and variable selection and can achieve over a 1000x speed-up.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset