Building artificial intelligence (AI) systems on top of a set of foundat...
Generating a stereophonic presentation from a monophonic audio signal is...
Recent work has studied text-to-audio synthesis using large amounts of p...
Recent works have shown the capability of deep generative models to tack...
Universal sound separation consists of separating mixes with arbitrary s...
Removing background noise from speech audio has been the subject of
cons...
We investigate which loss functions provide better separations via
bench...
Upsampling artifacts are caused by problematic upsampling layers and due...
Communication technologies like voice over IP operate under constrained
...
Score-based generative models provide state-of-the-art quality for image...
A number of recent advances in audio synthesis rely on neural upsamplers...
Applications of deep learning to automatic multitrack mixing are largely...
Automatic speech quality assessment is an important, transversal task wh...
Despite the growing interest in unsupervised learning, extracting meanin...
In many applications of multi-microphone multi-device processing, the
sy...
End-to-end models for raw audio generation are a challenge, specially if...
Text-to-speech (TTS) acoustic models map linguistic features into an aco...
The speech enhancement task usually consists of removing additive noise ...
Learning good representations without supervision is still an open issue...
Speech is a rich biometric signal that contains information about the
id...
Most methods of voice restoration for patients suffering from aphonia ei...
The conversion from text to speech relies on the accurate mapping from
l...
We study the use of a time series encoder to learn representations that ...
Speech enhancement deep learning systems usually require large amounts o...
Current speech enhancement techniques operate on the spectral domain and...
This thesis explore different approaches using Convolutional and Recurre...