FP8 is a natural progression for accelerating deep learning training
inf...
Low-precision is the first order knob for achieving higher Artificial
In...
Reduced precision computation for deep neural networks is one of the key...
This paper presents the first comprehensive empirical study demonstratin...
The state-of-the-art (SOTA) for mixed precision training is dominated by...
The exponential growth in use of large deep neural networks has accelera...
Sub-8-bit representation of DNNs incur some discernible loss of accuracy...
We propose a novel fine-grained quantization (FGQ) method to ternarize
p...
We propose a cluster-based quantization method to convert pre-trained fu...