We examine how transformers cope with two challenges: learning basic int...
In this short note we consider random fully connected ReLU networks of w...
Vision Transformers (ViTs) have achieved comparable or superior performa...
Adaptive methods are a crucial component widely used for training genera...
Stochastic gradient descent (SGD) with momentum is widely used for train...
High-dimensional depth separation results for neural networks show that
...
We introduce MADGRAD, a novel optimization method in the family of AdaGr...
First-order stochastic optimization methods are currently the most widel...
Designing an incentive compatible auction that maximizes expected revenu...
Designing an incentive compatible auction that maximizes expected revenu...
Finding Nash equilibria in two-player zero-sum continuous games is a cen...
Among the very first variance reduced stochastic methods for solving the...
Data-driven model training is increasingly relying on finding Nash equil...
Neural networks with a large number of parameters admit a mean-field
des...
We consider semidefinite programs (SDPs) of size n with equality constra...