The mixture proportions of pretraining data domains (e.g., Wikipedia, bo...
Scaling language models with more data, compute and parameters has drive...
We present a combined scaling method called BASIC that achieves 85.7
zer...
This paper explores zero-label learning in Natural Language Processing (...
This paper explores a simple method for improving the zero-shot learning...
With recent progress in joint modeling of visual and textual representat...
Despite achieving tremendous success, existing deep learning models have...
Neural Architecture Search (NAS) has achieved significant progress in pu...
Current end-to-end machine reading and question answering (Q&A) models a...
We propose a nonparametric method for detecting nonlinear causal relatio...
Machine learning with big data often involves large optimization models....
In this paper, we propose a generic and simple algorithmic framework for...
Recurrent Neural Networks are showing much promise in many sub-areas of
...
We consider the noisy power method algorithm, which has wide application...
We derive computationally tractable methods to select a small subset of
...
We study distributed stochastic convex optimization under the delayed
gr...
We propose a doubly stochastic primal-dual coordinate optimization algor...