b'Rohan Anil'

research

∙ 06/12/2023

Benchmarking Neural Network Training Algorithms

Training algorithms, broadly construed, are an essential part of every d...

6 George E. Dahl, et al. ∙

research

∙ 02/07/2023

Sketchy: Memory-efficient Adaptive Regularization with Frequent Directions

Adaptive regularization methods that exploit more than the diagonal entr...

1 Vladimir Feinberg, et al. ∙

research

∙ 09/15/2022

Layerwise Bregman Representation Learning with Applications to Knowledge Distillation

In this work, we propose a novel approach for layerwise representation l...

2 Ehsan Amid, et al. ∙

research

∙ 09/12/2022

On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models

For industrial-scale advertising systems, prediction of ad click-through...

2 Rohan Anil, et al. ∙

research

∙ 07/13/2022

N-Grammer: Augmenting Transformers with latent n-grams

Transformer models have recently emerged as one of the foundational mode...

7 Aurko Roy, et al. ∙

research

∙ 02/13/2022

Learning from Randomly Initialized Neural Network Features

We present the surprising result that randomly initialized neural networ...

0 Ehsan Amid, et al. ∙

research

∙ 01/31/2022

Step-size Adaptation Using Exponentiated Gradient Updates

Optimizers like Adam and AdaGrad have been very successful in training l...

3 Ehsan Amid, et al. ∙

research

∙ 09/10/2021

Efficiently Identifying Task Groupings for Multi-Task Learning

Multi-task learning can leverage information learned by one task to bene...

7 Christopher Fifty, et al. ∙

research

∙ 08/03/2021

Large-Scale Differentially Private BERT

In this work, we study the large-scale pretraining of BERT-Large with di...

26 Rohan Anil, et al. ∙

research

∙ 06/15/2021

Scalable Second Order Optimization for Deep Learning

Optimization in machine learning, both theoretical and applied, is prese...

1 Rohan Anil, et al. ∙

research

∙ 06/11/2021

LocoProp: Enhancing BackProp via Local Loss Optimization

We study a local loss construction approach for optimizing neural networ...

6 Ehsan Amid, et al. ∙

research

∙ 06/09/2021

Knowledge distillation: A good teacher is patient and consistent

There is a growing discrepancy in computer vision between large-scale mo...

9 Lucas Beyer, et al. ∙

research

∙ 02/12/2021

A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

Recently the LARS and LAMB optimizers have been proposed for training ne...

7 Zachary Nado, et al. ∙

research

∙ 10/29/2020

Measuring and Harnessing Transference in Multi-Task Learning

Multi-task learning can leverage information learned by one task to bene...

6 Christopher Fifty, et al. ∙

research

∙ 10/26/2020

Stochastic Optimization with Laggard Data Pipelines

State-of-the-art optimization is steadily shifting towards massively par...

20 Naman Agarwal, et al. ∙

research

∙ 02/26/2020

Disentangling Adaptive Gradient Methods from Learning Rates

We investigate several confounding factors in the evaluation of optimiza...

6 Naman Agarwal, et al. ∙

research

∙ 02/20/2020

Second Order Optimization Made Practical

Optimization in machine learning, both theoretical and applied, is prese...

16 Rohan Anil, et al. ∙

research

∙ 06/08/2019

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

We introduce a temperature into the exponential function and replace the...

1 Ehsan Amid, et al. ∙

research

∙ 02/21/2019

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

Lingvo is a Tensorflow framework offering a complete solution for collab...

13 Jonathan Shen, et al. ∙

research

∙ 01/30/2019

Memory-Efficient Adaptive Optimization for Large-Scale Learning

Adaptive gradient-based optimizers such as AdaGrad and Adam are among th...

1 Rohan Anil, et al. ∙

research

∙ 11/30/2018

TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank

TensorFlow Ranking is the first open source library for solving large-sc...

1 Rama Kumar Pasumarthi, et al. ∙

research

∙ 04/09/2018

Large scale distributed neural network training through online distillation

Techniques such as ensembling and distillation promise model quality imp...

1 Rohan Anil, et al. ∙

research

∙ 06/24/2016

Wide & Deep Learning for Recommender Systems

Generalized linear models with nonlinear feature transformations are wid...

1 Heng-Tze Cheng, et al. ∙

Rohan Anil

Featured Co-authors

Sign in with Google

Consider DeepAI Pro