Generating multi-instrument music from symbolic music representations is...
Text-to-image (T2I) personalization allows users to guide the creative i...
The majority of existing large 3D shape datasets contain meshes that len...
We introduce a high resolution spatially adaptive light source, or a
pro...
In recent months, we witness a leap forward as denoising diffusion model...
Text-to-image personalization aims to teach a pre-trained diffusion mode...
Synthesizing realistic animations of humans, animals, and even imaginary...
Reconstructing 3D shapes from planar cross-sections is a challenge inspi...
Natural and expressive human motion generation is the holy grail of comp...
Text-to-image models offer unprecedented freedom to guide creation throu...
Multi-instrument Automatic Music Transcription (AMT), or the decoding of...
We introduce MotionCLIP, a 3D human motion auto-encoder featuring a late...
Generative Adversarial Networks (GANs) are susceptible to bias, learned ...
Supervision for image-to-image translation (I2I) tasks is hard to come b...
Vision Transformers (ViT) serve as powerful vision models. Unlike
convol...
Mesh-based learning is one of the popular approaches nowadays to learn
s...
We present a novel approach to 3D object reconstruction from its 2D
proj...