Optimization is ubiquitous. While derivative-based algorithms have been
...
The mixture proportions of pretraining data domains (e.g., Wikipedia, bo...
Video frame interpolation aims to generate high-quality intermediate fra...
We study how in-context learning (ICL) in language models is affected by...
The best neural architecture for a given machine learning problem depend...
We revisit the design choices in Transformers, and propose methods to ad...
Sparsely-activated Mixture-of-experts (MoE) models allow the number of
p...
We present a combined scaling method called BASIC that achieves 85.7
zer...
Large Transformer models have been central to recent advances in natural...
Transformers have attracted increasing interests in computer vision, but...
Transformers have become one of the most important architectural innovat...
Neural networks are sensitive to hyper-parameter and architecture choice...
Most compilers for machine learning (ML) frameworks need to solve many
c...
Developing efficient models for mobile phones or other on-device deploym...
Efficient Neural Architecture Search methods based on weight sharing hav...
Pre-training is a dominant paradigm in computer vision. For example,
sup...
Inverted bottleneck layers, which are built upon depthwise convolutions,...
Normalization layers and activation functions are critical components in...
Neural architecture search (NAS) has shown promising results discovering...
Despite the blooming success of architecture search for vision tasks in
...
Neural Architecture Search methods are effective but often use complex
a...
Runtime and scalability of large neural networks can be significantly
af...
This paper addresses the scalability challenge of architecture search by...
We explore efficient neural architecture search methods and present a si...
Convolution Neural Network (CNN) has gained tremendous success in comput...
Large-scale multi-relational embedding refers to the task of learning th...
We present RACE, a new dataset for benchmark evaluation of methods in th...
The focus of past machine learning research for Reading Comprehension ta...
In this paper we study the problem of answering cloze-style questions ov...