Soumi Maiti

research

∙ 09/14/2023

Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks

We propose a decoder-only language model, VoxtLM, that can perform four ...

0 Soumi Maiti, et al. ∙

research

∙ 06/11/2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute

Self-supervised learning (SSL) has led to great strides in speech proces...

0 William Chen, et al. ∙

research

∙ 04/10/2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitat...

0 Brian Yan, et al. ∙

research

∙ 02/24/2023

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Multilingual Automatic Speech Recognition (ASR) models have extended the...

0 William Chen, et al. ∙

research

∙ 01/30/2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining

While neural text-to-speech (TTS) has achieved human-like natural synthe...

0 Takaaki Saeki, et al. ∙

research

∙ 12/08/2022

SpeechLMScore: Evaluating speech generation using speech language model

While human evaluation is the most reliable metric for evaluating speech...

0 Soumi Maiti, et al. ∙

research

∙ 03/31/2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

In this paper, we present a novel framework that jointly performs speake...

0 Yushi Ueda, et al. ∙

research

∙ 05/05/2021

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

We present an end-to-end deep network model that performs meeting diariz...

0 Soumi Maiti, et al. ∙

research

∙ 04/10/2020

Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data

We present progress towards bilingual Text-to-Speech which is able to tr...

0 Soumi Maiti, et al. ∙

research

∙ 11/14/2019

Speaker independence of neural vocoders and their effect on parametric resynthesis speech enhancement

Traditional speech enhancement systems produce speech with compromised q...

0 Soumi Maiti, et al. ∙

research

∙ 06/16/2019

Parametric Resynthesis with neural vocoders

Noise suppression systems generally produce output speech with copromise...

0 Soumi Maiti, et al. ∙

research

∙ 04/02/2019

Speech denoising by parametric resynthesis

This work proposes the use of clean speech vocoder parameters as the tar...

0 Soumi Maiti, et al. ∙

Soumi Maiti

Featured Co-authors

Sign in with Google

Consider DeepAI Pro