We introduce a software-hardware co-design approach to reduce memory tra...
Increasingly larger and better Transformer models keep advancing
state-o...
Data accesses between on- and off-chip memories account for a large frac...
We present FPRaker, a processing element for composing training accelera...
TensorDash is a hardware level technique for enabling data-parallel MAC ...
We motivate a method for transparently identifying ineffectual computati...
We show that, during inference with Convolutional Neural Networks (CNNs)...
This work studies the behavior of state-of-the-art memory controller des...