b'Minsoo Rhu'

research

∙ 08/23/2023

Pre-gated MoE: An Algorithm-System Co-Design for Fast and Scalable Mixture-of-Expert Inference

Large language models (LLMs) based on transformers have made significant...

0 Ranggi Hwang, et al. ∙

research

∙ 02/23/2023

Hera: A Heterogeneity-Aware Multi-Tenant Inference Server for Personalized Recommendations

While providing low latency is a fundamental requirement in deploying re...

0 Yujeong Choi, et al. ∙

research

∙ 01/26/2023

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

On-device machine learning (ML) inference can enable the use of private ...

0 Maximilian Lam, et al. ∙

research

∙ 08/26/2022

DiVa: An Accelerator for Differentially Private Machine Learning

The widespread deployment of machine learning (ML) is raising serious co...

11 Beomsik Park, et al. ∙

research

∙ 05/10/2022

SmartSAGE: Training Large-scale Graph Neural Networks using In-Storage Processing Architectures

Graph neural networks (GNNs) can extract features by learning both the r...

0 Yunjae Lee, et al. ∙

research

∙ 05/10/2022

Training Personalized Recommendation Systems from (GPU) Scratch: Look Forward not Backwards

Personalized recommendation models (RecSys) are one of the most popular ...

5 Youngeun Kwon, et al. ∙

research

∙ 05/02/2022

ARK: Fully Homomorphic Encryption Accelerator with Runtime Data Generation and Inter-Operation Key Reuse

Homomorphic Encryption (HE) is one of the most promising post-quantum cr...

0 Jongmin Kim, et al. ∙

research

∙ 03/01/2022

GROW: A Row-Stationary Sparse-Dense GEMM Accelerator for Memory-Efficient Graph Convolutional Neural Networks

Graph convolutional neural networks (GCNs) have emerged as a key technol...

0 Minhoo Kang, et al. ∙

research

∙ 02/27/2022

PARIS and ELSA: An Elastic Scheduling Algorithm for Reconfigurable Multi-GPU Inference Servers

In cloud machine learning (ML) inference systems, providing low latency ...

0 Yunseong Kim, et al. ∙

research

∙ 12/31/2021

BTS: An Accelerator for Bootstrappable Fully Homomorphic Encryption

Homomorphic encryption (HE) enables the secure offloading of computation...

0 Sangpyo Kim, et al. ∙

research

∙ 10/25/2020

LazyBatching: An SLA-aware Batching System for Cloud Machine Learning Inference

In cloud ML inference systems, batching is an essential technique to inc...

0 Yujeong Choi, et al. ∙

research

∙ 10/25/2020

Tensor Casting: Co-Designing Algorithm-Architecture for Personalized Recommendation Training

Personalized recommendations are one of the most widely deployed machine...

4 Youngeun Kwon, et al. ∙

research

∙ 05/12/2020

Centaur: A Chiplet-based, Hybrid Sparse-Dense Accelerator for Personalized Recommendations

Personalized recommendations are the backbone machine learning (ML) algo...

0 Ranggi Hwang, et al. ∙

research

∙ 11/15/2019

NeuMMU: Architectural Support for Efficient Address Translations in Neural Processing Units

To satisfy the compute and memory demands of deep neural networks, neura...

0 Bongjoon Hyun, et al. ∙

research

∙ 09/06/2019

PREMA: A Predictive Multi-task Scheduling Algorithm For Preemptible Neural Processing Units

To amortize cost, cloud vendors providing DNN acceleration as a service ...

0 Yujeong Choi, et al. ∙

research

∙ 08/08/2019

TensorDIMM: A Practical Near-Memory Processing Architecture for Embeddings and Tensor Operations in Deep Learning

Recent studies from several hyperscalars pinpoint to embedding layers as...

1 Youngeun Kwon, et al. ∙

research

∙ 02/18/2019

Beyond the Memory Wall: A Case for Memory-centric HPC System for Deep Learning

As the models and the datasets to train deep learning (DL) models scale,...

12 Youngeun Kwon, et al. ∙

research

∙ 06/01/2018

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training

Exploiting sparsity enables hardware systems to run neural networks fast...

0 Maohua Zhu, et al. ∙

research

∙ 05/23/2017

SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have emerged as a fundamental techn...

0 Angshuman Parashar, et al. ∙

research

∙ 05/03/2017

Compressing DMA Engine: Leveraging Activation Sparsity for Training Deep Neural Networks

Popular deep learning frameworks require users to fine-tune their memory...

0 Minsoo Rhu, et al. ∙

research

∙ 02/25/2016

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

The most widely used machine learning frameworks require users to carefu...

0 Minsoo Rhu, et al. ∙

Minsoo Rhu

Featured Co-authors

Sign in with Google

Consider DeepAI Pro