We propose a decoder-only language model, VoxtLM, that can perform
four ...
Self-supervised learning (SSL) has led to great strides in speech proces...
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...
Multilingual Automatic Speech Recognition (ASR) models have extended the...
While neural text-to-speech (TTS) has achieved human-like natural synthe...
While human evaluation is the most reliable metric for evaluating speech...
In this paper, we present a novel framework that jointly performs speake...
We present an end-to-end deep network model that performs meeting diariz...
We present progress towards bilingual Text-to-Speech which is able to
tr...
Traditional speech enhancement systems produce speech with compromised
q...
Noise suppression systems generally produce output speech with copromise...
This work proposes the use of clean speech vocoder parameters as the tar...