Large language models (LLMs) based on transformers have made significant...
While providing low latency is a fundamental requirement in deploying
re...
On-device machine learning (ML) inference can enable the use of private ...
The widespread deployment of machine learning (ML) is raising serious
co...
Graph neural networks (GNNs) can extract features by learning both the
r...
Personalized recommendation models (RecSys) are one of the most popular
...
Homomorphic Encryption (HE) is one of the most promising post-quantum
cr...
Graph convolutional neural networks (GCNs) have emerged as a key technol...
In cloud machine learning (ML) inference systems, providing low latency ...
Homomorphic encryption (HE) enables the secure offloading of computation...
In cloud ML inference systems, batching is an essential technique to inc...
Personalized recommendations are one of the most widely deployed machine...
Personalized recommendations are the backbone machine learning (ML) algo...
To satisfy the compute and memory demands of deep neural networks, neura...
To amortize cost, cloud vendors providing DNN acceleration as a service ...
Recent studies from several hyperscalars pinpoint to embedding layers as...
As the models and the datasets to train deep learning (DL) models scale,...
Exploiting sparsity enables hardware systems to run neural networks fast...
Convolutional Neural Networks (CNNs) have emerged as a fundamental techn...
Popular deep learning frameworks require users to fine-tune their memory...
The most widely used machine learning frameworks require users to carefu...