Naveen Mellempudi

research

∙ 09/12/2022

FP8 Formats for Deep Learning

FP8 is a natural progression for accelerating deep learning training inf...

0 Paulius Micikevicius, et al. ∙

research

∙ 08/29/2019

High Performance Scalable FPGA Accelerator for Deep Neural Networks

Low-precision is the first order knob for achieving higher Artificial In...

0 Sudarshan Srinivasan, et al. ∙

research

∙ 05/29/2019

Mixed Precision Training With 8-bit Floating Point

Reduced precision computation for deep neural networks is one of the key...

0 Naveen Mellempudi, et al. ∙

research

∙ 05/29/2019

A Study of BFLOAT16 for Deep Learning Training

This paper presents the first comprehensive empirical study demonstratin...

0 Dhiraj Kalamkar, et al. ∙

research

∙ 02/03/2018

Mixed Precision Training of Convolutional Neural Networks using Integer Operations

The state-of-the-art (SOTA) for mixed precision training is dominated by...

0 Dipankar Das, et al. ∙

research

∙ 01/24/2018

On Scale-out Deep Learning Training for Cloud and HPC

The exponential growth in use of large deep neural networks has accelera...

0 Srinivas Sridharan, et al. ∙

research

∙ 07/15/2017

Ternary Residual Networks

Sub-8-bit representation of DNNs incur some discernible loss of accuracy...

0 Abhisek Kundu, et al. ∙

research

∙ 05/02/2017

Ternary Neural Networks with Fine-Grained Quantization

We propose a novel fine-grained quantization (FGQ) method to ternarize p...

0 Naveen Mellempudi, et al. ∙

research

∙ 01/31/2017

Mixed Low-precision Deep Learning Inference using Dynamic Fixed Point

We propose a cluster-based quantization method to convert pre-trained fu...

0 Naveen Mellempudi, et al. ∙

Naveen Mellempudi

Featured Co-authors

Sign in with Google

Consider DeepAI Pro