Shashank Rajput

research

∙ 08/28/2023

Maestro: Uncovering Low-Rank Structures via Trainable Decomposition

Deep Neural Networks (DNNs) have been a large driver and enabler for AI ...

0 Samuel Horvath, et al. ∙

research

∙ 02/15/2023

The Expressive Power of Tuning Only the Norm Layers

Feature normalization transforms such as Batch and Layer-Normalization h...

0 Angeliki Giannou, et al. ∙

research

∙ 01/30/2023

Looped Transformers as Programmable Computers

We present a framework for using transformer networks as universal compu...

0 Angeliki Giannou, et al. ∙

research

∙ 06/14/2022

LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks

Fine-tuning pretrained language models (LMs) without making any architec...

8 Tuan Dinh, et al. ∙

research

∙ 05/23/2022

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Word translation without parallel corpora has become feasible, rivaling ...

0 Tuan Dinh, et al. ∙

research

∙ 10/20/2021

Minibatch vs Local SGD with Shuffling: Tight Convergence Bounds and Beyond

In distributed learning, local SGD (also known as federated averaging) a...

0 Chulhee Yun, et al. ∙

research

∙ 10/18/2021

Finding Everything within Random Binary Networks

A recent work by Ramanujan et al. (2020) provides significant empirical ...

0 Kartik Sreenivasan, et al. ∙

research

∙ 06/14/2021

An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks

It is well known that modern deep neural networks are powerful enough to...

0 Shashank Rajput, et al. ∙

research

∙ 02/19/2021

Permutation-Based SGD: Is Random Optimal?

A recent line of ground-breaking results for permutation-based SGD has c...

0 Shashank Rajput, et al. ∙

research

∙ 07/09/2020

Attack of the Tails: Yes, You Really Can Backdoor Federated Learning

Due to its decentralized nature, Federated Learning (FL) lends itself to...

18 Hongyi Wang, et al. ∙

research

∙ 06/14/2020

Optimal Lottery Tickets via SubsetSum: Logarithmic Over-Parameterization is Sufficient

The strong lottery ticket hypothesis (LTH) postulates that one can appro...

0 Ankit Pensia, et al. ∙

research

∙ 02/24/2020

Closing the convergence gap of SGD without replacement

Stochastic gradient descent without replacement sampling is widely used ...

0 Shashank Rajput, et al. ∙

research

∙ 07/29/2019

DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation

To improve the resilience of distributed training to worst-case, or Byza...

0 Shashank Rajput, et al. ∙

research

∙ 05/22/2019

Convergence and Margin of Adversarial Training on Separable Data

Adversarial training is a technique for training robust machine learning...

0 Zachary Charles, et al. ∙

research

∙ 05/08/2019

Does Data Augmentation Lead to Positive Margin?

Data augmentation (DA) is commonly used during model training, as it sig...

0 Shashank Rajput, et al. ∙

Shashank Rajput

Featured Co-authors

Sign in with Google

Consider DeepAI Pro