Diffusion models power a vast majority of text-to-audio (TTA) generation...
We study the problem of progressive distillation: Given a large, pre-tra...
In recent years, Generative Adversarial Networks (GANs) have produced
si...
Improving generalization is a major challenge in audio classification du...
Deep learning-based speech enhancement has shown unprecedented performan...
Unsupervised Zero-Shot Voice Conversion (VC) aims to modify the speaker
...
The Deep Noise Suppression (DNS) challenge is designed to foster innovat...
Visual reasoning tasks such as visual question answering (VQA) require a...
In late fusion, each modality is processed in a separate unimodal
Convol...
In this study, we introduce a convolutional time-frequency-channel "Sque...