Samyam Rajbhandari

research

∙ 08/02/2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

ChatGPT-like models have revolutionized various applications in artifici...

0 Zhewei Yao, et al. ∙

research

∙ 06/16/2023

ZeRO++: Extremely Efficient Collective Communication for Giant Model Training

Zero Redundancy Optimizer (ZeRO) has been used to train a wide range of ...

0 Guanhua Wang, et al. ∙

research

∙ 03/11/2023

A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training

Mixture-of-Experts (MoE) is a neural network architecture that adds spar...

0 Siddharth Singh, et al. ∙

research

∙ 06/30/2022

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

The past several years have witnessed the success of transformer-based m...

6 Reza Yazdani Aminabadi, et al. ∙

research

∙ 01/28/2022

Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model

Pretrained general-purpose language models can achieve state-of-the-art ...

8 Shaden Smith, et al. ∙

research

∙ 01/14/2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

As the training of giant dense models hits the boundary on the availabil...

8 Samyam Rajbhandari, et al. ∙

research

∙ 09/22/2021

Scalable and Efficient MoE Training for Multitask Multilingual Models

The Mixture of Experts (MoE) models are an emerging class of sparsely ac...

0 Young Jin Kim, et al. ∙

research

∙ 04/16/2021

ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning

In the last three years, the largest dense deep learning models have gro...

68 Samyam Rajbhandari, et al. ∙

research

∙ 04/13/2021

1-bit LAMB: Communication Efficient Large-Scale Large-Batch Training with LAMB's Convergence Speed

To train large models (like BERT and GPT-3) with hundreds or even thousa...

0 Conglong Li, et al. ∙

research

∙ 02/04/2021

1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed

Scalable training of large models (like BERT and GPT-3) requires careful...

0 Hanlin Tang, et al. ∙

research

∙ 01/18/2021

ZeRO-Offload: Democratizing Billion-Scale Model Training

Large-scale model training has been a playing ground for a limited few r...

0 Jie Ren, et al. ∙

research

∙ 08/26/2020

APMSqueeze: A Communication Efficient Adam-Preconditioned Momentum SGD Algorithm

Adam is the important optimization algorithm to guarantee efficiency and...

11 Hanlin Tang, et al. ∙

research

∙ 10/04/2019

ZeRO: Memory Optimization Towards Training A Trillion Parameter Models

Training large DL models with billions and potentially trillions of para...

1 Samyam Rajbhandari, et al. ∙

research

∙ 10/02/2019

AntMan: Sparse Low-Rank Compression to Accelerate RNN inference

Wide adoption of complex RNN based models is hindered by their inference...

0 Samyam Rajbhandari, et al. ∙

research

∙ 09/15/2017

Learning Intrinsic Sparse Structures within Long Short-Term Memory

Model compression is significant for the wide adoption of Recurrent Neur...

0 Wei Wen, et al. ∙

Samyam Rajbhandari

Featured Co-authors

Sign in with Google

Consider DeepAI Pro