Sparsely activated transformers, such as Mixture of Experts (MoE), have
...
Transformers achieve state-of-the-art performance for natural language
p...
Adam is a widely used optimization method for training deep learning mod...
Despite decades of research on approximate query processing (AQP), our
u...
Estimating the selectivity of a query is a key step in almost any cost-b...
The rising volume of datasets has made training machine learning (ML) mo...
There has been substantial research on sub-linear time approximate algor...
We revisit the Frank-Wolfe (FW) optimization under strongly convex const...
Despite 25 years of research in academia, approximate query processing (...