b'Zhewei Yao'

research

∙ 09/02/2023

RenAIssance: A Survey into AI Text-to-Image Generation in the Era of Large Model

Text-to-image generation (TTI) refers to the usage of models that could ...

0 Fengxiang Bie, et al. ∙

research

∙ 08/02/2023

DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales

ChatGPT-like models have revolutionized various applications in artifici...

0 Zhewei Yao, et al. ∙

research

∙ 07/19/2023

ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Floating-Point Formats

In the complex domain of large language models (LLMs), striking a balanc...

0 Xiaoxia Wu, et al. ∙

research

∙ 05/16/2023

Selective Guidance: Are All the Denoising Steps of Guided Diffusion Important?

This study examines the impact of optimizing the Stable Diffusion (SD) g...

0 Pareesa Ameneh Golnari, et al. ∙

research

∙ 03/15/2023

A Comprehensive Study on Post-Training Quantization for Large Language Models

Post-training quantization () had been recently shown as a compromising ...

0 Zhewei Yao, et al. ∙

research

∙ 03/13/2023

Scaling Vision-Language Models with Sparse Mixture of Experts

The field of natural language processing (NLP) has made significant stri...

0 Sheng Shen, et al. ∙

research

∙ 01/27/2023

Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases

Improving the deployment efficiency of transformer-based language models...

0 Xiaoxia Wu, et al. ∙

research

∙ 12/07/2022

DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing

Recent advances on deep learning models come at the price of formidable ...

0 Conglong Li, et al. ∙

research

∙ 11/17/2022

Random-LTD: Random and Layerwise Token Dropping Brings Efficient Training for Large-scale Transformers

Large-scale transformer models have become the de-facto architectures fo...

0 Zhewei Yao, et al. ∙

research

∙ 07/29/2022

BiFeat: Supercharge GNN Training via Graph Feature Quantization

Graph Neural Networks (GNNs) is a promising approach for applications wi...

0 Yuxin Ma, et al. ∙

research

∙ 06/04/2022

ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers

How to efficiently serve ever-larger trained natural language models in ...

0 Zhewei Yao, et al. ∙

research

∙ 06/04/2022

Extreme Compression for Pre-trained Transformers Made Simple and Efficient

Extreme compression, particularly ultra-low bit precision (binary/ternar...

0 Xiaoxia Wu, et al. ∙

research

∙ 01/14/2022

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

As the training of giant dense models hits the boundary on the availabil...

8 Samyam Rajbhandari, et al. ∙

research

∙ 09/08/2021

What's Hidden in a One-layer Randomly Weighted Transformer?

We demonstrate that, hidden within one-layer randomly weighted neural ne...

17 Sheng Shen, et al. ∙

research

∙ 07/13/2021

How Much Can CLIP Benefit Vision-and-Language Tasks?

Most existing Vision-and-Language (V L) models rely on pre-trained vis...

7 Sheng Shen, et al. ∙

research

∙ 05/30/2021

MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models

Pruning is an effective method to reduce the memory footprint and comput...

21 Zhewei Yao, et al. ∙

research

∙ 04/29/2021

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

The increasing size of neural network models has been critical for impro...

13 Jianfei Chen, et al. ∙

research

∙ 03/31/2021

Q-ASR: Integer-only Zero-shot Quantization for Efficient Speech Recognition

End-to-end neural network models achieve improved performance on various...

0 Sehoon Kim, et al. ∙

research

∙ 03/25/2021

A Survey of Quantization Methods for Efficient Neural Network Inference

As soon as abstract mathematical computations were adapted to computatio...

10 Amir Gholami, et al. ∙

research

∙ 01/22/2021

Hessian-Aware Pruning and Optimal Neural Implant

Pruning is an effective method to reduce the memory footprint and FLOPs ...

1 Shixing Yu, et al. ∙

research

∙ 01/05/2021

I-BERT: Integer-only BERT Quantization

Transformer based models, like BERT and RoBERTa, have achieved state-of-...

0 Sehoon Kim, et al. ∙

research

∙ 11/20/2020

HAWQV3: Dyadic Neural Network Quantization

Quantization is one of the key techniques used to make Neural Networks (...

7 Zhewei Yao, et al. ∙

research

∙ 10/27/2020

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Fully quantized training (FQT), which uses low-bitwidth hardware by quan...

0 Jianfei Chen, et al. ∙

research

∙ 10/12/2020

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding

Phrase localization is a task that studies the mapping from textual phra...

1 Qinxin Wang, et al. ∙

research

∙ 08/26/2020

Benchmarking Semi-supervised Federated Learning

Federated learning promises to use the computational power of edge devic...

15 Zhengming Zhang, et al. ∙

research

∙ 06/01/2020

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

We introduce AdaHessian, a second order stochastic optimization algorith...

9 Zhewei Yao, et al. ∙

research

∙ 03/17/2020

Rethinking Batch Normalization in Transformers

The standard normalization method for neural network (NN) models used in...

0 Sheng Shen, et al. ∙

research

∙ 01/01/2020

ZeroQ: A Novel Zero Shot Quantization Framework

Quantization is a promising approach for reducing the inference time and...

12 Yaohui Cai, et al. ∙

research

∙ 12/16/2019

PyHessian: Neural Networks Through the Lens of the Hessian

We present PyHessian, a new scalable framework that enables fast computa...

0 Zhewei Yao, et al. ∙

research

∙ 11/10/2019

HAWQ-V2: Hessian Aware trace-Weighted Quantization of Neural Networks

Quantization is an effective method for reducing memory footprint and in...

19 Zhen Dong, et al. ∙

research

∙ 09/12/2019

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Transformer based architectures have become de-facto models used for a r...

0 Sheng Shen, et al. ∙

research

∙ 06/10/2019

ANODEV2: A Coupled Neural ODE Evolution Framework

It has been observed that residual networks can be viewed as the explici...

3 Tianjun Zhang, et al. ∙

research

∙ 05/31/2019

Residual Networks as Nonlinear Systems: Stability Analysis using Linearization

We regard pre-trained residual networks (ResNets) as nonlinear systems a...

0 Kai Rothauge, et al. ∙

research

∙ 04/07/2019

JumpReLU: A Retrofit Defense Strategy for Adversarial Attacks

It has been demonstrated that very simple attacks can fool highly-sophis...

0 N. Benjamin Erichson, et al. ∙

research

∙ 03/14/2019

Inefficiency of K-FAC for Large Batch Size Training

In stochastic optimization, large batch training can leverage parallel r...

10 Linjian Ma, et al. ∙

research

∙ 02/20/2019

Shallow Learning for Fluid Flow Reconstruction with Limited Sensors and Limited Data

In many applications, it is important to reconstruct a fluid flow field,...

0 N. Benjamin Erichson, et al. ∙

research

∙ 12/16/2018

Trust Region Based Adversarial Attack on Neural Networks

Deep Neural Networks are quite vulnerable to adversarial perturbations. ...

0 Zhewei Yao, et al. ∙

research

∙ 12/04/2018

Parameter Re-Initialization through Cyclical Batch Size Schedules

Optimal parameter initialization remains a crucial problem for neural ne...

0 Norman Mu, et al. ∙

research

∙ 11/30/2018

On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent

Increasing the mini-batch size for stochastic gradient descent offers si...

0 Noah Golmant, et al. ∙

research

∙ 10/02/2018

Large batch size training of neural networks with adversarial training and second-order information

Stochastic Gradient Descent (SGD) methods using randomly selected batche...

2 Zhewei Yao, et al. ∙

research

∙ 02/22/2018

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Large batch size training of Neural Networks has been shown to incur acc...

0 Zhewei Yao, et al. ∙

Zhewei Yao

Featured Co-authors

Sign in with Google

Consider DeepAI Pro