Selective Pre-training for Private Fine-tuning
Suppose we want to train text prediction models in email clients or word processors. The models must preserve the privacy of user data and adhere to a specific fixed size to meet memory and inference time requirements. We introduce a generic framework to solve this problem. Specifically, we are given a public dataset D_pub and a private dataset D_priv corresponding to a downstream task T. How should we pre-train a fixed-size model M on D_pub and fine-tune it on D_priv such that performance of M with respect to T is maximized and M satisfies differential privacy with respect to D_priv? We show that pre-training on a subset of dataset D_pub that brings the public distribution closer to the private distribution is a crucial ingredient to maximize the transfer learning abilities of M after pre-training, especially in the regimes where model sizes are relatively small. Besides performance improvements, our framework also shows that with careful pre-training and private fine-tuning, smaller models can match the performance of much larger models, highlighting the promise of differentially private training as a tool for model compression and efficiency.
READ FULL TEXT