There are growing interests in adapting large-scale language models usin...
The recent advance of self-supervised learning associated with the
Trans...
While model compression is increasingly important because of large neura...
Even though fine-grained pruning techniques achieve a high compression r...
Various post-training uniform quantization methods have usually been stu...
Transformer is being widely used in Neural Machine Translation (NMT).
De...
Quantization based on the binary codes is gaining attention because each...
The number of parameters in deep neural networks (DNNs) is rapidly incre...
Low-rank approximation is an effective model compression technique to no...
Model compression techniques, such as pruning and quantization, are beco...
Pruning is an efficient model compression technique to remove redundancy...
Model compression has been introduced to reduce the required hardware
re...
Model compression has gained a lot of attention due to its ability to re...