Real-time and robust photorealistic avatars for telepresence in AR/VR ha...
Large Vision-Language Foundation Models (VLFM), such as CLIP, ALIGN and
...
We present a method that accelerates reconstruction of 3D scenes and obj...
Vision Transformers (ViTs) have shown impressive performance but still
r...
Real-time multi-model multi-task (MMMT) workloads, a new form of deep
le...
We tackle the task of NeRF inversion for style-based neural radiance fie...
Real-time tracking of 3D hand pose in world space is a challenging probl...
We introduce Token Merging (ToMe), a simple method to increase the throu...
Open-vocabulary semantic segmentation aims to segment an image into sema...
While transformers have begun to dominate many tasks in vision, applying...
Traditional computer vision models are trained to predict a fixed set of...
Neural Architecture Search (NAS) has been widely adopted to design accur...
Traditional computer vision models are trained to predict a fixed set of...
Semi-supervised learning, i.e., training networks with both labeled and
...
To unlock video chat for hundreds of millions of people hindered by poor...
Nowadays more and more applications can benefit from edge-based
text-to-...
Differential Neural Architecture Search (NAS) requires all layer choices...
3D photography is a new medium that allows viewers to more fully experie...
We present a novel 3D pose refinement approach based on differentiable
r...
Computer vision has achieved great success using standardized image
repr...
Neural Architecture Search (NAS) yields state-of-the-art neural networks...
Differentiable Neural Architecture Search (DNAS) has demonstrated great
...
Machine-learning (ML) hardware and software system demand is burgeoning....
This paper proposes an efficient neural network (NN) architecture design...
Designing accurate and efficient ConvNets for mobile devices is challeng...
Recent work in network quantization has substantially reduced the time a...